Lag
Lag
Lag
Based on the IEEE 802.1ax standard (formerly 802.3ad), Link Aggregation Groups (LAGs) can
be configured to increase the bandwidth available between two network devices, depending on
the number of links installed. LAG also provides redundancy in the event that one or more links
participating in the LAG fail. All physical links in a given LAG links combine to form one logical
interface.
Packet sequencing must be maintained for any given session. The hashing algorithm deployed
by the Alcatel-Lucent routers is based on the type of traffic transported to ensure that all traffic
in a flow remains in sequence while providing effective load sharing across the links in the LAG.
LAGs must be statically configured or formed dynamically with Link Aggregation Control
Protocol (LACP). The optional marker protocol described in IEEE 802.1ax is not implemented.
LAGs can be configured on network and access ports.
The LAG load sharing is executed in hardware, which provides line rate forwarding for all port
types.
The LAG implementation supports LAG that with all member ports of the same speed and LAG
with mixed port-speed members (see later section for details).
The LAG implementation is supported on access and network interfaces.
LACP
Under normal operation, all non-failing links in a given LAG will become active and traffic is load
balanced across all active links. In some circumstances, however, this is not desirable. Instead,
it desired that only some of the links are active (for example, all links on the same IOM) and the
other links be kept in stand-by condition.
LACP enhancements allow active lag-member selection based on particular constrains. The
mechanism is based on the IEEE 802.1ax standard so interoperability is ensured.
To use LACP on a given LAG, operator must enable LACP on the LAG including, if desired,
selecting non-default LACP mode: active/passive and configuring administrative key to be used
(configure lag lacp). IN addition an operator can configure desired LACP transmit interval
(configure lag lacp-xmit-interval).
When LACP is enabled, an operator can see LACP changes through traps/log messages
logged against the LAG. See the TIMETRA-LAG-MIB.mib for more details.
LACP Multiplexing
The router supports two modes of multiplexing RX/TX control for LACP: coupled and
independent.
In coupled mode (default), both RX and TX are enabled or disabled at the same time whenever
a port is added or removed from a LAG group.
In independent mode, RX is first enabled when a link state is UP. LACP sends an indication to
the far-end that it is ready to receive traffic. Upon the reception of this indication, the far-end
system can enable TX. Therefore, in independent RX/TX control, LACP adds a link into a LAG
only when it detects that the other end is ready to receive traffic. This minimizes traffic loss that
might occur in coupled mode if a port is added into a LAG before notifying the far-end system or
before the far-end system is ready to receive traffic. Similarly, on link removals from LAG, LACP
turns off the distributing and collecting bit and informs the far-end about the state change. This
allows the far-end side to stop sending traffic as soon as possible.
Independent control provides for lossless operation for unicast traffic in most scenarios when
adding new members to a LAG or when removing members from a LAG. It also reduces loss for
multicast and broadcast traffic.
Note that independent and coupled mode are interoperable (connected systems can have either
mode set).
In case of a link failure, as shown in Figure 28 and Figure 29, the switch over behavior ensures
that all lag-members connected to the same IOM as failing link will become stand-by and lag-
members connected to other IOM will become active. This way, QoS enforcement constraints
are respected, while the maximum of available links is utilized.
SAP Queues % # local 100% rate 100% rate (SAP hash to 100% rate (SAP hash to one
links 1 one link) link)
or or
%# all links (SAP hash
2
% # local linksa (SAP hash
to all links) to all links)
SAP Scheduler % # local 100% 100% rate (SAP hash to 100% bandwidth (SAP hash
linksa bandwidth one link) to a one link)
or or
%# all linksb (SAP hash % # local linksa (SAP hash
to all links) to all links)
Per-fp-ing-queuing
Per-fp-ing-queuing optimization for LAG ports provides the ability to reduce the number of
hardware queues assigned on each LAG SAP on ingress when the flag at LAG level is set for
per-fp-ing-queuing.
When the feature is enabled in the config>lag>access context, the queue allocation for SAPs
on a LAG will be optimized and only one queuing set per ingress forwarding path (FP) is
allocated instead of one per port.
The following rules will apply for configuring the per-fp-ing-queuing at LAG level:
To enable per-fp-ing-queuing, the LAG must be in access mode
The LAG mode cannot be set to network mode when the feature is enabled
Per-fp-ing-queuing can only be set if no port members exists in the LAG
Per-fp-ing-queuing cannot be set if LAG’s port-type is hsmda
Per-fp-egr-queuing
Per-fp-egr-queuing optimization for LAG ports provides the ability to reduce the number of
egress resources consumed by each SAP on a LAG, and by any encap groups that exist on
those SAPs.
When the feature is enabled in the config>lag>access context, the queue and virtual scheduler
allocation will be optimized. Only one queuing set and one H-QoS virtual scheduler tree per
SAP/encap group will be allocated per egress forwarding path (FP) instead of one set per each
port of the LAG. In case of a link failure/recovery, egress traffic uses failover queues while the
queues are moved over to a newly active link.
Per-fp-egr-queuing can be enabled on existing LAG with services as long as the following
conditions are met.
The LAG’s mode must be access or hybrid.
The LAG’s port-type must be standard.
The LAG must have either per-link-hash enabled or all SAPs on the LAG must use per-
service-hashing only and be of a type: VPLS SAP, i-VPLS SAP, or e-Pipe VLL or PBB SAP.
The system must be, at minimum, in chassis mode d (configure>system>chassis-mode)
To disable per-fp-egr-queuing, all ports must first be removed from a given LAG.
Per-fp-sap-instance
Per-fp-sap-instance optimization for LAG ports provides the ability to reduce the number of SAP
instance resources consumed by each SAP on a lag.
When the feature is enabled, in the config>lag>access context, a single SAP instance is
allocated on ingress and on egress per each forwarding path instead of one per port. Thanks to
an optimized resource allocation, the SAP scale on a line card will increase, if a LAG has more
than one port on that line card. Because SAP instances are only allocated per forwarding path
complex, hardware reprogramming must take place when as result of LAG links going down or
up, a SAP is moved from one LAG port on a given line card to another port on a given line card
within the same forwarding complex. This results in an increased data outage when compared
to per-fp-sap-instance feature being disabled. During the reprogramming, failover queues are
used when SAP queues are reprogrammed to a new port. Any traffic using failover queues will
not be accounted for in SAPs statistics and will be processed at best-effort priority.
The following rules apply when configuring a per-fp-sap-instance on a given LAG:
Minimum chassis mode D is required.
Per-fp-sap-ingress-queuing and per-fp-sap-egr-queuing must be enabled.
The functionality can be enabled/disabled on LAG with no member ports only. Services can be
configured.
Other caveats:
SAP instance optimization applies to LAG-level. Whether a LAG is sub-divided into sub-groups
or not, the resources are allocated per forwarding path for all complexes LAG’s links are
configured on (i.e. irrespective of whether a given sub-group a SAP is configured on uses that
complex or not).
Egress statistics continue to be returned per port when SAP instance optimization is enabled. If
a LAG links are on a single forwarding complex, all ports but one will have no change in
statistics for the last interval – unless a SAP moved between ports during the interval.
Rollback that changes per-fp-sap-instance configuration is service impacting.
LSR Hashing
The LSR hash routine operates on the label stack only. However, there is also the ability to
hash on the IP header if a packet is IP. An LSR will consider a packet to be IP if the first nibble
following the bottom of the label stack is either 4 (IPv4) or 6 (IPv6). This allows the user to
include an IP header in the hashing routine at an LSR for the purpose of spraying labeled IP
packets over multiple equal cost paths in ECMP in an LDP LSP and/or over multiple links of a
LAG group in all types of LSPs.
The user enables the LSR hashing on label stack and/or IP header by entering the following
system-wide command: config>system>load-balancing>lsr-load-balancing [lbl-only | lbl-
ip | ip-only]
By default, the LSR falls back to the hashing on label stack only. This option is referred to as lbl-
only and the user can revert to this behavior by entering one of the two commands:
config>system>load-balancing>lsr-load-balancing lbl-only
config>system>load-balancing>no lsr-load-balancing
The user can also selectively enable or disable the inclusion of label stack and IP header in the
LSR hash routine on a specific network interface by entering the following command:
config>router>interface>load-balancing>lsr-load-balancing [lbl-only | lbl-ip | ip-only]
This provides some control to the user such that this feature is disabled if labeled packets
received on a specific interface include non IP packets that can be confused by the hash routine
for IP packets. These could be VLL and VPLS packets without a PW control word.
When the user performs the no form of this command on an interface, the interface inherits the
system level configuration.
The default lbl-only hash option and the label-ip option with IPv4 payload is supported on all
platforms and chassis modes. The ip-only option with both IPv4 and IPv6 payloads as well as
the lbl-ip option with IPv6 payload are only supported on IP interfaces on IOM3/IMM ports.
Weighted per-link-hash
Weighted per-link-hash allows higher control in distribution of SAPs/interfaces/subscribers
across LAG links when significant differences in SAPs/interfaces/subscribers bandwidth
requirements could lead to an unbalanced distribution bandwidth utilization over LAG egress.
The feature allows operators to configure for each SAPs/interfaces/subscribers on a LAG one of
three unique classes and a weight value to be used to when hashing this service/subscriber
across the LAG links. SAPs/interfaces/subscribers are hashed to LAG links, such that within
each class the total weight of all SAPs/interfaces/subscribers on each LAG link is as close as
possible to each other.
Multiple classes allow grouping of SAPs/interfaces/subscribers by similar bandwidth class/type.
For example a class can represent: voice – negligible bandwidth, Broadband – 10 to 100 Mbps,
Extreme Broadband – 300 Mbps and above types of service. If a class and weight are not
specified for a given service or subscriber, values of 1 and 1 are used respectively.
The following algorithm is used to hash SAPs/interfaces/subscribers to LAG egress links:
TPSDA subscribers are hashed to a LAG link when subscribers are active, MSE
SAPs/interfaces are hashed to a LAG link when configured
For a new SAP/interface/subscriber to be hashed to an egress LAG link:
Select active link with the smallest current weight for the SAP/network/subscriber class
On a LAG link failure:
Only SAPs/interfaces/subscribers on a failed link are rehashed over the remaining active links
Processing order: Per class from lowest numerical, within each class per weight from highest
numerical value
LAG link recovery/new link added to a LAG:
auto-rebalance disabled: Existing SAPs/interfaces/subscribers remain on the currently active
links, new SAPs/interfaces/subscribers naturally prefer the new link until balance reached.
auto-rebalance is enabled: When a new port is added to a LAG a non-configurable 5 second
rebalance timer is started. Upon timer expiry, all existing SAPs/interfaces/subscribers are
rebalanced across all active LAG links minimizing the number of SAPs/interfaces/subscribers
moved to achieve rebalance. The rebalance timer is restarted if a new link is added while the
timer is running. If a port bounces 5 times within a 5 second interval, the port is quarantined
for10 seconds. This behavior is not configurable.
On a LAG start-up, the rebalance timer is always started irrespective of auto-rebalance
configuration to avoid hashing SAPs/interfaces/subscribers to a LAG before ports have a
chance to come UP.
Weights for network interfaces are separated from weights for access
SAPs/interfaces/subscribers.
On a mixed-speed LAG, link selection is made with link speeds factoring into the overall weight
for the same class of traffic. This means that higher-speed links will be preferred over lower-
speed links.
Optionally an operator can use a tools perform lag load-balance command to manually re-
balance ALL weighted per-link-hashed SAPs/interfaces/subscribers on a LAG. The rebalance
follows the algorithm as used on a link failure moving SAPs/interfaces/subscribers to different
LAG links to minimize SAPs/interfaces/subscribers impacted.
Along with the caveats for standard per-link hashing, the following caveats exist:
When weighted per-link-hash is deployed on a given LAG, no other methods of hash for
subscribers/SAPs/interfaces on that LAG (like service hash or LAG link map profile) should be
deployed, since the weighted hash is not able to account for loads placed on LAG links by
subscriber/SAPs/interfaces using the other hash methods.
For the TPSDA model only the 1:1 (subscriber to SAP) model is supported.
This feature will not operate properly if the above conditions are not met.
per-link-hash – traffic for a given SAP/network interface will be re-hashed over remaining
active ports of a LAG links using per-link-hashing algorithm. This behavior ensures
SAP/network interfaces using this profile will be given available resources of other active LAG
ports even if that means impacting other SAP/network interfaces on the LAG. The system will
use the QoS configuration to provide fairness and priority if congestion is caused by the default-
hash recovery.
LAG link mapping profiles, can be enabled on a LAG as long as the following conditions are
met:
LAG port-type must be standard.
LAG access adapt-qos must be link or port-fair (for LAGs in mode access or hybrid)
All ports of a LAG on a given router must belong to a single sub-group.
System must be at minimum in chassis mode d (configure system chassis-mode)
Access adapt-qos mode is distribute include-egr-hash-cfg.
LAG link mapping profile can co-exist with any-other hashing used over a given LAG (for
example, per flow hashing or per-link-hashing). SAPs/network interfaces that have no link
mapping profile configured will be subject to LAG hashing, while SAPs/network interfaces that
have configured LAG profile assigned will be subject to LAG link mapping behavior, which is
described above.
Through this CLI hierarchy, S-tag is implicitly associated with the inter-dest-string and
consequently with the Vport.
Link Placement
This feature requires that all active member ports in a LAG reside on the same forwarding
complex (IOM/IMM).
Multicast Consideration
Multicast traffic that is directly replicated per subscriber follows the same hashing algorithm as
the rests of the subscribers (per inter-dest-string hashing).
Multicast traffic that is redirected to a regular Layer 3 interface outside of the ESM will be
hashed per destination group (or IP address).
Note that this is only applicable to L2 ESM. In the case where this is configured and Vport
hashing is required, the following order of evaluation must be executed:
1. Hashing based on subscriber-id or inter-dest-string
2. If configured, mac-da-hashing
Hashing per inter-dest-string will win if a <Vport, subscriber> association is available at the
same time as the mac-da-hashing is configured.
The Mac-da-hashing mechanism cannot transition from a capture SAP to a derived MSAP.
LAG Upgrade
Migrating LAGs to higher speed links involves using mixed-speed LAGs initially, and later
removing lower speed links. However, a consequence is that the lower speed links in the mixed-
speed LAG set the member link limit. Even after all lower speed links are removed, the higher-
speed links maintain a higher weight and this limits how many physical links that a mixed-port
speed LAG can include.
LAG upgrade support allows migration from 1GE to 10GE to 40/100GE without removing all the
ports from the LAG.
LAG upgrade support requires turning on mixed-speed LAG and adding higher speed links to
an existing LAG. Once the lower speed links are removed, the no-port-weight-
speed command is used to turn off mixed-speed LAG and to re-calibrate the number of logical
links. Figure 29 illustrates the steps in this scenario.
If a 10GE or 100GE port is allocated as 10 links, it would be converted to one link per port if all
the ports in the LAG are the same speed.
Multi-Chassis LAG
This section describes the Multi-Chassis LAG (MC-LAG) concept. MC-LAG is an extension of a
LAG concept that provides node-level redundancy in addition to link-level redundancy provided
by “regular LAG”.
Typically, MC-LAG is deployed in a network-wide scenario providing redundant connection
between different end points. The whole scenario is then built by combination of different
mechanisms (for example, MC-LAG and redundant pseudowire to provide e2e redundant p2p
connection or dual homing of DSLAMs in Layer 2/3 TPSDA).
Overview
Multi-chassis LAG is a method of providing redundant Layer 2/3 access connectivity that
extends beyond link level protection by allowing two systems to share a common LAG end
point.
The multi-service access node (MSAN) node is connected with multiple links towards a
redundant pair of Layer 2/3 aggregation nodes such that both link and node level redundancy,
are provided. By using a multi-chassis LAG protocol, the paired Layer 2/3 aggregation nodes
(referred to as redundant-pair) appears to be a single node utilizing LACP towards the access
node. The multi-chassis LAG protocol between redundant-pair ensures a synchronized
forwarding plane to/from the access node and is used to synchronize the link state information
between the redundant-pair nodes such that proper LACP messaging is provided to the access
node from both redundant-pair nodes.
In order to ensure SLAs and deterministic forwarding characteristics between the access and
the redundant-pair node, the multi-chassis LAG function provides an active/standby operation
towards/from the access node. LACP is used to manage the available LAG links into active and
standby states such that only links from 1 aggregation node are active at a time to/from the
access node.
Alternatively, when access nodes does not support LACP, the power-off option can be used to
enforce active/standby operation. In this case, the standby ports are trx_disabled (power off
transmitter) to prevent usage of the lag member by the access-node.Characteristics related to
MC are:
Selection of the common system ID, system-priority and administrative-key are used in LACP
messages so partner systems consider all links as the part of the same LAG.
Extension of selection algorithm in order to allow selection of active sub-group.
The sub-group definition in LAG context is still local to the single box, meaning that even if sub-
groups configured on two different systems have the same sub-group-id they are still
considered as two separate subgroups within given LAG.
Multiple sub-groups per PE in a MC-LAG is supported.
In case there is a tie in the selection algorithm, for example, two sub-groups with identical
aggregate weight (or number of active links) the group which is local to the system with lower
system LACP priority and LAG system ID is taken.
Providing inter-chassis communication channel allows inter-chassis communication to support
LACP on both system. This communication channel enables the following:
Supports connections at the IP level which do not require a direct link between two nodes. The
IP address configured at the neighbor system is one of the addresses of the system (interface
or loop-back IP address).
The communication protocol provides heartbeat mechanism to enhance robustness of the MC-
LAG operation and detecting node failures.
Support for operator actions on any node that force an operational change.
The LAG group-ids do not have to match between neighbor systems. At the same time, there
can be multiple LAG groups between the same pair of neighbors.
Verification that the physical characteristics, such as speed and auto-negotiation is configured
and initiates operator notifications (traps) if errors exist. Consistency of MC-LAG configuration
(system-id, administrative-key and system-priority) is provided. Similarly, load-balancing mode
of operation must be consistently configured on both nodes.
Traffic over the signaling link is encrypted using a user configurable message digest key.
MC-LAG function provides active/stand-by status to other software applications in order to built
a reliable solutions.
Figure 31 and Figure 32 show the different combinations of MC-LAG attachments that are
supported. The supported configurations can be sub-divided into following sub-groups:
Dual-homing to remote PE pairs
both end-points attached with MC-LAG
one end-point attached
Dual-homing to local PE pair
both end-points attached with MC-LAG
one end-point attached with MC-LAG
both end-points attached with MC-LAG to two overlapping pairs
The forwarding behavior of the nodes abide by the following principles. Note that logical
destination (actual forwarding decision) is primarily determined by the service (VPLS or VLL)
and the principle below applies only if destination or source is based on MC-LAG:
Packets received from the network will be forwarded to all local active links of the given
destination-sap based on conversation hashing. In case there are no local active links, the
packets will be cross-connected to inter-chassis pseudowire.
Packets received from the MC-LAG sap will be forwarded to active destination pseudo-wire or
active local links of destination-sap. In case there are no such objects available at the local
node, the packets will be cross-connected to inter-chassis pseudowire.
802.3ah OAM
802.3ah Clause 57 (efm-oam) defines the Operations, Administration, and Maintenance (OAM)
sub-layer, which provides mechanisms useful for monitoring link operation such as remote fault
indication and remote loopback control. In general, OAM provides network operators the ability
to monitor the health of the network and quickly determine the location of failing links or fault
conditions. efm-oam described in this clause provides data link layer mechanisms that
complement applications that may reside in higher layers.
OAM information is conveyed in slow protocol frames called OAM protocol data units
(OAMPDUs). OAMPDUs contain the appropriate control and status information used to monitor,
test and troubleshoot OAM-enabled links. OAMPDUs traverse a single link, being passed
between peer OAM entities, and as such, are not forwarded by MAC clients (like bridges or
switches).
The following efm-oam functions are supported:
efm-oam capability discovery
Active and passive modes
Remote failure indication — Handling of critical link events (link fault, dying gasp, etc.)
Loopback — A mechanism is provided to support a data link layer frame-level loopback mode.
Both remote and local loopback modes are supported
efm-oam PDU tunneling
High resolution timer for efm-oam in 100ms interval (minimum)
efm-oam link monitoring
Non-zero Vendor Specific Information Field — The 32-bit field is encoded using the format
00:PP:CC:CC and references TIMETRA-CHASSIS-MIB.
00 — Must be zeroes
PP — Platform type based on the installed IOM from tmnxHwEquippedPlatform. Mixed mode
deployments may yield different platform values in the same chassis. Since this is IOM-specific,
the IOM’s unique hardware ID (tmnxCardHwIndex) must be included to retrieve the proper
value.
CC:CC — Chassis type index value from tmnxChassisType which is indexed in
tmnxChassisTypeTable. The table identifies the specific chassis backplane.
The value 00:00:00:00 is sent for all releases that do not support the non-zero value or are
unable to identify the required elements. There is no decoding of the peer or local vendor
information fields on the network element. The hexadecimal value is included in the show
port port-id ethernet efm-oam output.
When the efm-oam protocol fails to negotiate a peer session or encounters a protocol failure
following an established session the Port State will enter the Link Up condition. This port state is
used by many protocols to indicate the port is administratively UP and there is physical
connectivity but a protocol, such as efm-oam, has caused the ports operational state to enter a
DOWN state. A reason code has been added to help discern if the efm-oam protocol is the
underlying reason for the Link Up condition.
show port
======================================================================
=========
Ports on Slot 1
======================================================================
=========
Port Admin Link Port Cfg Oper LAG/ Port Port Port C/QS/S/
XFP/
Id State State MTU MTU Bndl Mode Encp Type MDIMDX
----------------------------------------------------------------------
---------
1/1/1 Down No Down 1578 1578 - netw null xcme
1/1/2 Down No Down 1578 1578 - netw null xcme
1/1/3 Up Yes Link Up 1522 1522 - accs qinq xcme
1/1/4 Down No Down 1578 1578 - netw null xcme
1/1/5 Down No Down 1578 1578 - netw null xcme
1/1/6 Down No Down 1578 1578 - netw null xcme
# show port 1/1/3
======================================================================
=========
Ethernet Interface
======================================================================
=========
Description : 10/100/Gig Ethernet SFP
Interface : 1/1/3 Oper Speed : N/A
Link-
level : Ethernet Config Speed : 1 Gbps
Admin State : up Oper Duplex : N/A
Oper State : down Config Duplex : ful
l
Reason Down : efmOamDown
Physical Link : Yes MTU : 152
2
Single Fiber Mode : No Min Frame Length : 64
Bytes
IfIndex : 35749888 Hold time up : 0 s
econds
Last State Change : 12/18/2012 15:58:29 Hold time down : 0 s
econds
Last Cleared Time : N/A DDM Events : Ena
bled
Phys State Chng Cnt: 1
Configured Mode : access Encap Type : Qin
Q
Dot1Q Ethertype : 0x8100 QinQ Ethertype : 0x8
100
PBB Ethertype : 0x88e7
Ing. Pool % Rate : 100 Egr. Pool % Rate : 100
Ing. Pool Policy : n/a
Egr. Pool Policy : n/a
Net. Egr. Queue Pol: default
Egr. Sched. Pol : n/a
Auto-
negotiate : true MDI/MDX : unknown
Oper Phy-tx-clock : not-applicable
Accounting Policy : None Collect-
stats : Disabled
Acct Plcy Eth Phys : None Collect Eth Phys : Dis
abled
Egress Rate : Default Ingress Rate : Def
ault
Load-balance-
algo : Default LACP Tunnel : Disabled
Down-when-looped : Disabled Keep-alive : 10
Loop Detected : False Retry : 120
Use Broadcast Addr : False
Sync. Status Msg. : Disabled Rx Quality Level : N/A
Tx DUS/DNU : Disabled Tx Quality Level : N/A
SSM Code Type : sdh
Down On Int. Error : Disabled
CRC Mon SD Thresh : Disabled CRC Mon Window : 10
seconds
CRC Mon SF Thresh : Disabled
Configured Address : d8:ef:01:01:00:03
Hardware Address : d8:ef:01:01:00:03
The operator also has the opportunity to decouple the efm-oam protocol from the port state and
operational state. In cases where an operator wants to remove the protocol, monitor the
protocol only, migrate, or make changes the ignore-efm-state can be configured in
the port>ethernet>efm-oam context. When the ignore-efm-state command is configured on a
port the protocol continues as normal. However, any failure in the protocol state machine
(discovery, configuration, time-out, loops, etc.) will not impact the port on which the protocol is
active and the optional ignore command is configured. There will only be a protocol warning
message if there are issues with the protocol. The default behavior when this optional command
is not configured means the port state will be affected by any efm-oam protocol fault or clear
conditions. Adding and removing this optional ignore command will immediately represent
the Port State and Oper State based on the active configuration. For example, if the ignore-
efm-state is configured on a port that is exhibiting a protocol error that protocol error does not
affect the port state or operational state and there is no Reason Down code. If the ignore-efm-
state is removed from a port with an existing efm-oam protocol error, the port will transition
to Link UP, Oper Down with the reason code efmOamDown.
OAM Events
The Information OAMPDU is transmitted by each peer at the configured intervals. This
OAMPDU performs keepalive and critical notification functions. Various local conditions are
conveyed through the setting of the Flags field. The following Critical Link Event defined in IEEE
802.3 Section 57.2.10.1 are supported;
Link Fault: The PHY has determined a fault has occurred in the receive direction of the local
DTE
Dying Gasp: An unrecoverable local failure condition has occurred
Critical Event: An unspecified critical event has occurred
The local node can set an unset the various Flag fields based on the operational state of the
port, shutdown or activation of the efm-oam protocol or locally raised events. These Flag fields
maintain the setting for the continuance of a particular event. Changing port conditions, protocol
state or operator intervention may impact the setting of these fields in the Information OAMPDU.
A peer processing the Information OAMPDU can take a configured action when one or more of
these Flag fields are set. By default, receiving a set value for any of the Flag fields will cause
the local port to enter the previous mentioned Link Up port state and an event will be logged. If
this default behavior is not desired, the operator may choose to log the event without affecting
the local port. This is configurable per Flag field using the options
under config>port>ethernet>efm-oam>peer-rdi-rx.
Link Monitoring
The efm-oam protocol provides the ability to monitor the link for error conditions that may
indicate the link is starting to degrade or has reached an error rate that exceeds acceptable
threshold.
Link monitoring can be enabled for three types of frame errors; errored-frame, errored-frame-
period and errored-frame-seconds. The errored-frame monitor is the number of frame errors
compared to the threshold over a window of time. The errored-frame-period monitor is the
number of frame errors compared to the threshold over a window of number of received
packets. This window is checked once per second to see if the window parameter has been
reached. The errored-frame-seconds monitor is the number of errored seconds compared to
the threshold over a window of time. An errored second is any second with a single frame error.
An errored frame is counted when any frame is in error as determined by the Ethernet physical
layer, including jabbers, fragments, FCS or CRC and runts. This excludes jumbo frames with a
byte count higher than 9212, or any frame that is dropped by the phy layer prior to reaching the
monitoring function.
Each frame error monitor functions independently of other monitors. Each of monitor
configuration includes an optional signal degrade threshold sd-threshold, a signal failure
threshold sf-threshold, a window and the ability to communicate failure events to the peer by
setting a Flag field in the Information OAMPDU or the generation of the Event Notification
OAMPDU, event-notification. The parameters are uniquely configurable for each monitor.
A degraded condition is raised when the configured signal degrade sd-threshold is reached.
This provides a first level log only action indicating a link could become unstable. This event
does not affect the port state. The critical failure condition is raised when the configured sf-
threshold is reached. By default, reaching the signal failure threshold will cause the port to
enter the Link Up condition unless the local signal failure local-sf-actionhas been modified to
a log-only action. Signal degrade conditions for a monitor in signal failed state will be
suppressed until the signal failure has been cleared.
The initial configuration or the modification of either of the threshold values will take affect in the
current window. When a threshold value for a monitor is modified, all active local events for that
specific monitor will be cleared. The modification of the threshold acts the same as
the clear command described later in this section.
Notification to the peer is required to ensure the action taken by the local port detecting the error
and its peer are synchronized. If peers do not take the same action then one port may remain
fully operational while the other enters a non-operational state. These threshold crossing events
do not shutdown the physical link or cause the protocol to enter a non-operational state. The
protocol and network element configuration is required to ensure these asymmetrical states do
not occur. There are two options for exchanging link and event information between peers;
Information OAMPDU and the Event Notification OAMPDU.
As discussed earlier, the Information OAMPDU conveys link information using the Flags field;
dying gasp, critical link and link fault. This method of communication has a number of significant
advantages over the Event Notification OAMPDU. The Information OAMPDU is sent at every
configured transmit-interval. This will allow the most recent information to be sent between
peers, a critical requirement to avoid asymmetrical forwarding conditions. A second major
advantage is interoperability with devices that do not support Link Monitoring and vendor
interoperability. This is the lowest common denominator that offers a robust communication to
convey link event information. Since the Information OAMPDU is already being sent to maintain
the peering relationship this method of communication adds no additional overhead. The local-
sf-action options allow the dying gasp and critical event flags to be set in the Information
OAMPDU when a signal failure threshold is reached. It is suggested that this be used in place
of or in conjunction with Event Notification OAMPDU.
Event Notification OAMPDU provides a method to convey very specific information to a peer
about various Link Events using Link Event TLVs. A unique Event Notification OAMPDU will be
generated for each unique frame error event. The intension is to provide the peer with the
Sequence Number, Event Type, Timestamp, and the local information that caused the
generation of the OAMPDU; window, threshold, errors and error running total and event running
total specific to the port.
Sequence Number: The unique identification indicating a new event.
Window: The size of the unique measurement period for the error type. The window is only
checked at the end. There is not mid-window checking.
Threshold: The value of the configured sf-threshold
Errors: The errors counted in that specific window
Error Running Total: The number of errors accumulated for that event type since monitoring
started and the protocol and port have been operational or a reset function has occurred
Event Running Total: The number of events accumulated for that event type since the
monitoring started and the protocol and port have been operational
By default, the Event Notification OAMPDU is generated by the network element detecting the
signal failure event. The Event Notification OAMPDU is sent only when the initial frame event
occurs. No Event Notification OAMPDU is sent when the conditions clears. A port that has been
operationally affected as a result of a Link Monitoring frame error event must be recovered
manually. The typical recovery method is to shutdown the port and no shutdown the port. This
will clear all events on the port. Any function that affects the port state, physical fiber pull, soft or
hard reset functions, protocol restarts, etc will also clear the all local and remote events on the
affected node experiencing the operation. None of these frame errors recovery actions will
cause the generation of the Event Notification OAMPDU. If the chosen recovery action is not
otherwise recognized by the peer and the Information OAMPDU Flag fields have not been
configured to maintain the current event state, there is a high probability that the ports will have
different forwarding states, notwithstanding any higher level protocol verification that may be in
place.
A burst of between one and five Event Notification OAMPDU packets may be sent. By default,
only a single Event Notification OAMPDU is generated, but this value can be changed under
the local-sf-action context. An Event Notification OAMPDU will only be processed if the peer
had previously advertised the EV capability. The EV capability is an indication the remote peer
supports link monitoring and may send the Event Notification OAMPDU.
The network element receiving the Event Notification OAMPDU will use the values contained in
the Link event TLVs to determine if the remote node has exceeded the failure threshold. The
locally configured action will determine how and if the local port is affected. By default,
processing of the Event Notification OAMPDU is log only and does not affect the port state. By
default, processing of the Information OAMPDU Flag fields is port affecting. When Event
Notification OAMPDU has been configured as port affecting on the receiving node, action is
only taken when errors are equal to or above the threshold and the threshold value is not zero.
No action is taken when the errors value is less than the threshold or the threshold is zero.
Symbol error, errored-symbols, monitoring is also supported but requires specific hardware
revisions and the appropriate code release. The symbol monitor differs from than the frame
error monitors. Symbols represent a constant load on the Ethernet wire whether service frames
are present or not. This means the optional signal degrade threshold sd-threshold has an
additional purpose when configured as part of the symbol error monitor. When the signal
degrade threshold is not configured, the symbol monitor acts similar to the frame error monitors,
requiring manual intervention to clear a port that has been operationally affected by the monitor.
When the optional signal degrade threshold is configured, it again represents the first level
warning. However, it has an additional function as part of the symbol monitor. If a signal failure
event has been raised, the configured signal degrade threshold becomes the equivalent to a
lowering threshold. If a subsequent window does not reach the configured signal degrade
threshold then the previous event will be cleared and the previously affected port will be
returned to service without operator intervention. This return to service will automatically clear
any previously set Information OAMPDU Flags fields set as a result of the signal failure
threshold. The Event Notification OAMPDU will be generated with the symbol error Link TLV
that contains an error count less than the threshold. This will indicate to the peer that initial
problem has been resolved and the port should be returned to service.
The errored-symbol window is a measure of time that is automatically converted into the
number of symbols for that specific medium for that period of time. The standard MIB entries
“dot3OamErrSymPeriodWindowHi” and “dot3OamErrSymPeriodWindowLo” are marked as
read-only instead of read-write. There is now way to directly configure these values. The
configuration of the window will convert the time and program those two MIB values in an
appropriate manner. Both the configured window and the number of symbols will be displayed
under the show port port-id ethernet efm-oam command.
show port 1/1/1 ethernet efm-oam
======================================================================
=========
Ethernet Oam (802.3ah)
======================================================================
=========
Admin State : up
Oper State : operational
Mode : active
Pdu Size : 1518
Config Revision : 0
Function Support : LB
Transmit Interval : 1000 ms
Multiplier : 5
Hold Time : 0
Tunneling : false
Loop Detected : false
Grace Tx Enable : true (inactive)
Grace Vendor OUI : 00:16:4d
Dying Gasp on Reset: true (inactive)
Soft Reset Tx Act : none
Trigger Fault : none
Vendor OUI : 00:16:4d (alu)
Vendor Info : 00:01:00:02
Peer Mac Address : d8:1c:01:02:00:01
Peer Vendor OUI : 00:16:4d (alu)
Peer Vendor Info : 00:01:00:02
Peer Mode : active
Peer Pdu Size : 1518
Peer Cfg Revision : 0
Peer Support : LB
Peer Grace Rx : false
Loopback State : None
Loopback Ignore Rx : Ignore
Ignore Efm State : false
Link Monitoring : disabled
Peer RDI Rx
Critical Event : out-of-service
Dying Gasp : out-of-service
Link Fault : out-of-service
Event Notify : log-only
Local SF Action Discovery
Event Burst : 1 Ad Link Mon Cap : yes
Port Action : out-of-service
Dying Gasp : disabled
Critical Event : disabled
Errored Frame Errored Frame Period
Enabled : no Enabled : no
Event Notify : enabled Event Notify : enabled
SF Threshold : 1 SF Threshold : 1
SD Threshold : disabled (0) SD Threshold : disabled
(0)
Window : 10 ds Window : 1488095 f
rames
Errored Symbol Period Errored Frame Seconds Summary
Enabled : no Enabled : no
Event Notify : enabled Event Notify : enabled
SF Threshold : 1 SF Threshold : 1
SD Threshold : disabled (0) SD Threshold : disabled
(0)
Window (time) : 10 ds Window : 600 ds
Window (symbols) : 125000000
======================================================================
=========
Active Failure Ethernet OAM Event Logs
======================================================================
=========
Number of Logs : 0
======================================================================
=========
======================================================================
=========
Ethernet Oam Statistics
======================================================================
=========
Input
Output
----------------------------------------------------------------------
---------
Information 238522
238522
Loopback Control 0
0
Unique Event Notify 0
0
Duplicate Event Notify 0
0
Unsupported Codes 0
0
Frames Lost
0
======================================================================
=========
A clear command “clear port port-id ethernet efm-oam events [local | remote]” has been
added to clear port affecting events on the local node on which the command is issued. When
the optional [local | remote] options are omitted, both local and remote events will be cleared
for the specified port. This command is not specific to the link monitors as it clears all active
events. When local events are cleared, all previously set Information OAMPDU Flag fields will
be cleared regardless of the cause the event that set the Flag field.
In the case of symbol errors only, if Event Notification OAMPDU is enabled for symbol errors
and a local symbol error signal failure event exists at the time of the clear, the Event Notification
OAMPDU will be generate with an error count of zero and the threshold value reflecting the
local signal failure threshold. The fact the error values is lower than threshold value indicates
the local node is not in a signal failed state. The Event Notification OAMPDU is not generated in
the case where the clear command is used to clear local frame error events. This is because
frame error event monitors will only act on an Event Notification OAMPDU when the error value
is higher than the threshold value, a lower value is ignored. As stated previously, there is no
automatic return to service for frame errors.
If the clear command is used to clear remote events, events conveyed to the local node by the
peer, no notification is generated to the peer to indicate a clear function has been performed.
Since the Event Notification OAMPDU is only sent when the initial event was raised, there is no
further Event Notification and blackholes can result. If the Information OAMPDU Flag fields are
used to ensure a constant refresh of information, the remote error will be reinstated as soon as
the next Information OAMPDU arrives with the appropriate Flag field set.
Local and remote efm-oam port events are stored in the efm-oam event logs. These logs
maintain and display active and cleared signal failure degrade events. These events are
interacting with the efm-oam protocol. This logging is different than the time stamped events for
information logging purposes included with the system log. To view these events, the event-
log option has been added to the show port port-id ethernet efm-oamcommand. This includes
the location, the event type, the counter information or the decoded Network Event TLV
information, and if the port has been affected by this active event. A maximum of 12 port events
will be retained. The first three indexes are reserved for the three Information Flag fields, dying
gasp, critical link, and link fault. The other nine indexes will maintain the current state for the
various error monitors in a most recent behavior and events can wrap the indexes, dropping the
oldest event.
In mixed environments where Link Monitoring is supported on one peer but not the other the
following behavior is normal, assuming the Information OAMPDU has been enabled to convey
the monitor fault event. The arriving Flag field fault will trigger the efm-oam protocol on the
receiving unsupportive node to move from operational to “send local and remote”. The protocol
on the supportive node that set the Flag field to convey the fault will enter the “send local and
remote ok” state. The supportive node will maintain the Flag field setting until the condition has
cleared. The protocol will recover to the operational state once the original event has cleared;
assuming no other fault on the port is preventing the negotiation from progressing. If both nodes
were supportive of the Link Monitoring process, the protocol would remained operational.
In summary, Link monitors can be configured for frame and symbol monitors (specific hardware
only). By default, Link Monitoring and all monitors are shutdown. When the Link Monitoring
function is enabled, the capability (EV) will be advertised. When a monitor is enabled, a default
window size and a default signal failure threshold are activated. The local action for a signal
failure threshold event is to shutdown the local port. Notification will be sent to the peer using
the Event Notification OAMPDU. By default, the remote peer will not take any port action for the
Event Notification OAMPDU. The reception will only be logged. It is suggested the operator
evaluate the various defaults and configure the local-sf-action to set one of the Flag fields in
the Information OAMPDU using the info-notifications command options when fault notification
to a peer is required. Vendor specific TLVs and vendors specific OAMPDUs are just that,
specific to that vendor. Non-ALU vendor specific information will not be processed.
Capability Advertising
A supported capability, sometimes requiring activation, will be advertised to the peer. The EV
capability is advertisement when Link Monitoring is active on the port. This can be disabled
using the optional command no link-monitoring under the config>port>ethernet>efm-
oam>discovery>advertise-capabilities.
Remote Loopback
EFM OAM provides a link-layer frame loopback mode that can be remotely controlled.
To initiate remote loopback, the local EFM OAM client sends a loopback control OAM PDU by
enabling the OAM remote-loopback command. After receiving the loopback control OAM PDU,
the remote OAM client puts the remote port into local loopback mode.
To exit remote loopback, the local EFM OAM client sends a loopback control OAM PDU by
disabling the OAM remote-loopback command. After receiving the loopback control OAM PDU,
the remote OAM client puts the port back into normal forwarding mode.
Note that during remote loopback test operation, all frames except EFM OAM PDUs are
dropped at the local port for the receive direction, where remote loopback is enabled. If local
loopback is enabled, then all frames except EFM OAM PDUs are dropped at the local port for
both the receive and transmit directions. This behavior may result in many protocols (such as
STP or LAG) resetting their state machines.
When a port is in loopback mode, service mirroring will not work if the port is a mirror-source or
a mirror-destination.
The difference between the two is subtle but important. When an active node performs this
function it will generate an Informational TLV with the Local TLV following the successful soft
reset. When it receives the Information PDU with the Grace Ack it will send its own Information
PDU with both Local and Remote TLV completed. This will complete the protocol restart. When
a passive node is reset the passive port will wait to receive the 802.3ah OAM protocol before
sending its own Information PDU with both the Local and Remote TLV thus completing the
protocol restart.
The renegotiation process allows the node which experienced the soft reset to rebuild the
session without having to restart the session from the discovery phase. This significantly
reduces the impact of the native protocol on data forwarding.
Any situation that could cause the renegotiation to fail will force the protocol to revert to the
discovery phase and fail the graceful restart. During a Major ISSU when the EFM-OAM session
is held operational by the Grace function, if the peer MAC address of the session changes,
there will be no log event raised for the MAC address change.
The vendor-specific grace function benefits are realized when both peers support the
transmitting, receiving and processing of the vendor-specific Grace TLV. In the case of mixed
code versions, products, or vendor environments, a standard EFM-OAM message to the peer
can be used to instruct the peer to treat the session as failed. When the command dying-gasp-
tx-on-reset is active on a port, the soft reset function triggers ETH-OAM to set the dying gasp
flag or critical event flag in the Information OAMPDU. An initial burst of three Informational OAM
PDUs will be sent using a one second spacing, regardless of the protocol interval. The peer
may process these flags to affect its port state and take the appropriate action. The control of
the local port state where the soft reset is occurring is left to the soft reset function. This EFM-
OAM function does not affect local port state. If the peer has acted on the exception flags and
affected its port state, then the local node must take an action to inform the upstream nodes that
a condition has occurred and forwarding is no longer possible. Routing protocols like ISIS and
OSPF overload bits are typically used in routed environments to accomplish this notification.
This feature is similar to grace-tx-enable. Intercepting system messaging, when the feature is
active on a port (enabled both at the port and at the system level) and when the messaging
occurs, is a similar concept. However, because the dying-gasp-tx-on-reset command is not a
graceful function it is interruptive and service affecting. Using dying-gasp-tx-on-reset requires
peers to reestablish the peering session from an initial state, not rebuild the state from previous
protocol information. The transmission of the dying gasp or the critical event commences when
the soft reset occurs and continues for the duration of the soft reset.
If both functions are active on the same port, the grace-tx-enable function is preferred if the
peer is setting and sending the Vendor OUI to 00:16:4d (ALU) in the Information OAMPDU. In
this situation, the dying gasp function will not be invoked. A secondary Vendor OUI can be
configured using the grace-vendor-oui oui command, should an additional Vendor OUI prefer
to support the reception, parsing, and processing of the vendor-specific grace message instead
of the dying gasp. If only one of those functions is active on the port then that specific function
will be called. The grace function should not be enabled if the peer Vendor OUI is equal to
00:16:4d (ALU) and the peer does not support the grace function.
ETH-OAM allows generation of a fault condition by using the trigger-fault {dying-
gasp | critical-event} command. This sets the appropriate flag fields in the Information
OAMPDU and transitions a previously operational local port to Link Up. Removing this
command from the configuration stops the flags from being set and allows the port to return to
service, assuming no other faults would prevent this resumption of service. In cases where a
port must be administratively shut down, this command can be used to signal a peer using the
EFM-OAM protocol, and the session should be considered failed.
These features do not support the clearing of an IOM which does not trigger a soft reset. IOM
clearing is a forceful event that does not trigger graceful protocol renegotiation.
A number of show commands have been enhanced to help operators determine the state of
the802.3ah OAM Grace function and whether or not the peer is generating or receiving the
Grace TLV.
System level information can be viewed using the show system info command.
show system information
======================================================================
=========
System Information
======================================================================
=========
System Name : system-name
System Type : 7750 SR-12
System Version : 11.0r4
System Contact :
System Location :
System Coordinates :
System Active Slot : A
System Up Time : 62 days, 20:29:48.96 (hr:min:sec)
…snip…
EFM OAM Grace Tx Enable: False
======================================================================
=========
Configuration Example
In order for the maximum length service frame to successfully travel from a local ingress SAP to
a remote egress SAP, the MTU values configured on the local ingress SAP, the SDP (GRE or
MPLS), and the egress SAP must be coordinated to accept the maximum frame size the service
can forward. For example, the targeted MTU values to configure for a distributed Epipe service
(ALA-A and ALA-B) are shown in Figure 37.
Since ALA-A uses Dot1q encapsulation, the SAP MTU must be set to 1518 to be able to accept
a 1514 byte service frame (see Table 27 for MTU default values). Each SDP MTU must be set
to at least 1514 as well. If ALA-A’s network port (2/1/1) is configured as an Ethernet port with a
GRE SDP encapsulation type, then the MTU value of network ports 2/1/1 and 3/1/1
must each be at least 1556 bytes (1514 MTU + 28 GRE/Martini + 14 Ethernet). Finally, the MTU
of ALA-B’s SAP (access port 4/1/1) must be at least 1514, as it uses null encapsulation.
Table 28 shows sample MTU configuration values.
ALA-A ALA-B
Deploying Preprovisioned
Components
When a card, CMA, MDA, XCM or XMA is installed in a preprovisioned slot, the device detects
discrepancies between the preprovisioned card type configurations and the types actually
installed. Error messages display if there are inconsistencies and the card will not initialize.
When the proper preprovisioned cards are installed into the appropriate chassis slot, alarm,
status, and performance details will display.
fabric-speed-a
The 7750 SR-12e chassis defaults to the fabric-speed-a parameter when initially deployed with
SFM5-12e. The fabric-speed-a parameter operates at 200 GB per slot which permits a mix of
FP2/FP3 based cards to co-exist.
fabric-speed-b
The fabric-speed-b parameter enables the 7750 SR-12e to operate at up to 400 Gb/s, for
which all cards in the 7750 SR-12e are required to be T3 based (FP3 IMM and/or IOM3-XP-C).
The system will not support any FP2 based cards when the chassis is set to fabric-speed-b.
Figure 38: Slot, Card, MDA, and Port Configuration and Implementation Flow
Configuration Notes
The following information describes provisioning caveats:
If a card or MDA type is installed in a slot provisioned for a different type, the card will not
initialize.
A card or MDA installed in an unprovisioned slot remain administratively and operationally down
until the card type and MDA is specified.
Ports cannot be provisioned until the slot, card and MDA type are specified.
cHDLC does not support HDLC windowing features, nor other HDLC frame types such as S-
frames.
cHDLC operates in the HDLC Asynchronous Balanced Mode (ABM) of operation.
APS configuration rules:
A physical port (either working or protection) must be shutdown before it can be removed from
an APS group port.
For a single-chassis APS group, a working port must be added first. Then a protection port can
be added or removed at any time.
A protection port must be shutdown before being removed from an APS group.
A path cannot be configured on a port before the port is added to an APS group.
A working port cannot be removed from an APS group until the APS port path is removed.
When ports are added to an APS group, all path-level configurations are available only on the
APS port level and configuration on the physical member ports are blocked.
For APS-protected bundles, all members of a working bundle must reside on the working port of
an APS group. Similarly all members of a protecting bundle must reside on the protecting circuit
of that APS group.