1 Introduction
Industrial automation systems commonly employ a hierarchical architecture to perform designed control and automation processes [
81]. Ethernet-based fieldbus communication systems are currently dominating the automation industry, with multiple protocols and standards available [
128]. However, different vendors may select different industrial Ethernet protocols for use in their devices, resulting in incompatibilities among the deployed equipment. This phenomenon contributes to industrial automation architectures being hierarchical, custom-built, and inflexible when integrating devices from different vendors or standards [
69]. Fortunately, driven by the recent advances in Industrial
Internet of Things (IIoT) technologies, many technical initiatives are pushing industrial automation applications to be more flexible, interoperable, and seamless. One of the most important requirements for industrial automation is real-time and deterministic communication, which is essential for realizing mission-critical control processes [
25].
Critical traffic flows generated by industrial automation applications require bounded low latency and low jitter to improve production efficiency and reduce communication costs. Typically, these critical traffic flows need to share the communication medium (e.g., Ethernet) with non-critical flows (e.g., those with less severe timing constraints) originating from the same applications. Under these conditions, it is imperative to guarantee the timing behavior of critical traffic and provide temporal isolation from non-critical communications. The IEEE 802.1 Time-Sensitive Networking Task Group (TSN TG), evolved from the former IEEE 802.1 Audio Video Bridging (AVB) TG, addresses this need by designing general-purpose protocols applicable to various fields, such as factory automation, process automation, substation control, and aerospace applications.
The IEEE TSN TG currently aims to improve the reliability and real-time capabilities of the Ethernet standard (e.g., IEEE 802.3 and IEEE 802.1D). It focuses on several essential aspects of the IEEE AVB standards crucial for industrial automation, including reduced latency, deterministic transmission, independence from physical transmission rates, fault tolerance without additional hardware, and interoperability of solutions from different vendors. Compared to traditional Ethernet-based fieldbus systems, the advantage of TSN is also manifold, including vendor neutrality, higher throughput, more network configuration flexibility, and better scalability [
98].
TSN is a collection of standards, standard amendments, and projects published or under development by the TSN TG within the IEEE 802.1
Working Group (WG). There are four main pillars on which TSN is built: (1) time synchronization, (2) guaranteed
end-to-end (e2e) latency, (3) reliability, and (4) resource management. These characteristics make TSN a strong candidate for meeting special requirements in industrial automation, such as deterministic communication, ultra-low communication latency, and extremely high reliability. While TSN standardization efforts are ongoing, several manufacturers have already demonstrated the promising performance of TSN, showing much higher determinism than current state-of-the-art solutions [
10,
82]. However, the benefits of TSN come with challenges that need to be addressed in the deployment of industrial automation systems. These challenges include stringent requirements on network synchronization precision, increased traffic scheduling complexity, integration with wireless devices, and so on.
This paper provides a comprehensive review of the current advances in standardization and research efforts related to TSN for industrial automation. We first give a systematical introduction to the published TSN standards relevant to industrial automation systems and explore the challenges each standard attempts to address. We then highlight how and to what extent these standardization efforts empower Ethernet applications, supporting the new requirements raised by current and future industrial use cases. Note that, in addition to the automation industry, deploying TSN technologies is of great interest in many other industries requiring deterministic, low-latency, and high-reliability communications, including automotive applications [
11], aerospace [
50], and healthcare [
76], which are not the focus of this survey.
The rest of this article is organized as follows. Section
2 provides the background of industrial automation and IEEE TSN technologies. Section
3 describes the up-to-date TSN standardization efforts in detail, and Section
4 discusses the integration of TSN into industrial automation systems. Section
5 discusses the challenges in each category of TSN standards. Section
6 presents the future directions related to TSN R&D, and Section
7 concludes the article.
3 TSN Standardization
We can broadly classify the TSN standardization efforts into four major sets, as shown in Figure
3, while the classifications are not disjoint, as some standards contribute to multiple aspects. The four main pillars on which TSN is built are: (1) time synchronization, (2) guaranteed e2e latency, (3) reliability, and (4) resource management. We will detail each aspect below, and explain the advantages of TSN over the existing industrial solutions at the end of this section.
3.1 Time Synchronization
Time synchronization is crucial for most applications targeted by the IEEE 802.1Q standards. Many TSN standards depend on network-wide precise time synchronization, with varying requirements when transitioning from AVB streaming to time-sensitive and safety-critical control applications. In a typical TSN network, a common time reference is shared by all TSN entities and used to schedule data and control signaling. Time synchronization in TSN is defined primarily by two key standards: IEEE 802.1AS and IEEE 802.1AS-Rev.
The IEEE 802.1AS standard utilizes and optimizes the IEEE 1588-2008 (1588v2) protocol, which includes the
Generic Precision Time Protocol (gPTP) to synchronize clocks across the network [
119]. It is also one of the three IEEE 802.1 AVB standards, targeting network audio/video applications. gPTP achieves clock synchronization between network devices by exchanging predefined messages across the communication medium.
A typical gPTP employs a messaging mechanism between the
Clock Master (CM), also known as the
GrandMaster (GM), and
Clock Slaves (CS) to create a time-aware network. This network uses peer-to-peer delay mechanism to calculate timing information such as link latency (between bridges) and residence time (within bridges). Link latency consists of the time spent on the link (e.g., the single-hop propagation delay between two adjacent switches), and residence time includes the time spent within the switch (e.g., processing time, queuing time, and transmission time). The GM clock serves as the reference time at the root of the time-aware network hierarchy and is selected by the
Best Master Clock Algorithm (BMCA) [
157], which automatically designates the grandmaster device. The BMCA dynamically configures the synchronization hierarchy, known as the synchronization spanning tree. This spanning tree is constructed using a priority vector derived from the announce message. Each port is assigned to one of three states: master, slave, or passive. Additionally, ports not in use are set to a disabled state.
In the gPTP protocol, entities are divided into time-aware systems and non-time-aware systems. A time-aware system must implement one or more PTP instances for synchronization across single or multiple domains. A PTP instance is required to support essential functions of the IEEE 802.1AS standard, such as BMCA and synchronization state machine. Depending on its function, a PTP instance is further categorized as either a PTP relay instance, which communicates synchronized time from one PTP port to others, or a PTP end instance, which has only one PTP port.
IEEE 802.1AS-Rev introduces new capabilities required for time-sensitive applications in several ways. First, GMs and synchronization trees can be redundantly configured to enhance fault tolerance, allowing synchronization trees to be explicitly configured without using the BMCA algorithm. Additionally, IEEE 802.1AS-Rev supports redundant communication by enabling multiple time domains for gPTP. Each gPTP domain operates as a separate instance, allowing network devices to execute multiple instances of gPTP simultaneously. This enhances redundancy by permitting multiple grandmaster clocks and synchronization spanning trees, facilitating seamless synchronization recovery.
3.2 Bounded Latency
One primary characteristic of TSN standards is the guaranteed delivery of messages with stringent timing constraints, i.e., bounded e2e latency. In this section, we discuss several standards in TSN towards bounded latency.
3.2.1 IEEE 802.1Qav Forwarding and Queuing of Time-Sensitive Streams.
IEEE 802.1Qav specifies the enhancements for the transmission selection algorithms of Ethernet switches and defines the credit-based shaper (CBS) to ensure bounded latency for time-sensitive traffic by regulating the transmission rate. CBS is a traffic shaping mechanism that regulates bandwidth allocation for high-priority-shaped queues to reduce delays in medium- and low-priority unshaped queues, thereby enhancing fairness. In CBS, each output queue is associated with a credit counter. The credit counter accumulates credits when the queue waits to transmit frames and consumes credits when frames are transmitted. A frame can only be transmitted if the credit of its queue is non-negative and no other frames are being transmitted at the same time. If no frames are waiting for transmission, the credit of the queue is reset to zero. The queue credit decreases and increases at a constant rate which is configurable.
For bandwidth-intensive applications, the CBS protocol can establish an upper bound for each traffic class, ensuring that no traffic class exceeds the pre-configured threshold on reserved bandwidth, typically less than 75% of the maximum bandwidth. Along with SRP, the CBS shaper aims to limit delays to less than 250
\(\mu\)s per bridge and the worst-case latency to up to 2
\(ms\) for class A, and up to 50
\(ms\) for class B in a simple network setup [
73]. However, these delay scales may still be too high for industrial applications. This has motivated the TSN TG to introduce other standards, such as IEEE 802.1Qbv, IEEE 802.1Qch, and IEEE 802.1Qcr, to meet the stringent timing requirements of industrial applications.
3.2.2 IEEE 802.1Qbv Enhancements to Traffic Scheduling (Time-Aware Shaper (TAS)).
IEEE 802.1Qbv introduces the concept of a gate per queue to control the open/close of a queue, where a frame can be transmitted only if the gate of the corresponding queue is open. In TAS, critical traffic is scheduled in protected traffic windows with allocated time slots, similar to the TDMA paradigm. Each window can have an allotted transmission time for high-priority traffic, as illustrated in Figure
4. To prevent potential interference, the traffic windows are isolated by a specified time duration, called the guard band. The guard bands enforce time intervals after best-effort traffic during which all gates are closed, ensuring neither best-effort traffic nor periodic traffic can be sent during these intervals. These guard bands are required to prevent large best-effort frames from interfering with periodic traffic.
The TAS shaper requires that all traffic windows be well synchronized and scheduled among all the time-aware bridges. The communication schedule in IEEE 802.1Qbv is realized by the scheduled gate mechanism, which controls the opening and closing of queues using a pre-determined
gate control list (GCL). Each GCL includes a limited number of entries, with each entry providing the status of associated queues over a particular duration. The GCL repeats itself periodically, and this period is called the cycle time. The network-wide schedule is generated by
centralized network configuration (CNC) and deployed on individual bridges. Although the IEEE 802.1Qbv standard defines the scheduling mechanism of TAS, its configuration, i.e., what to put in the GCL and how to assign queues for individual traffic at each hop, lacks a clear-cut best practice [
155]. This has resulted in significant efforts from both researchers and practitioners to study the TAS-based scheduling problems in various industrial applications. More discussion regarding TAS scheduling is provided in Section
4.3.
3.2.3 IEEE 802.3br and 802.1Qbu Interspersing Express Traffic and Frame Preemption.
To address the inverted priority problem, i.e., ongoing transmission of a low-priority frame prevents the transmission of high-priority frames, the IEEE 802.1 TG along with the IEEE 802.3 TG defined the frame preemption protocol in IEEE 802.1Qbu and IEEE 802.3br. These technologies work together to effectively manage traffic using changes to both the MAC scheme, as controlled by IEEE 802.3, and management mechanisms, as supervised by IEEE 802.1. The frame preemption capability can be combined with any traffic management algorithms defined in IEEE 802.1Q, such as the TAS shaper and CBS shaper, to enhance determinism and real-time performance for critical traffic.
IEEE 802.1Qbu allows time-critical data frames to be split into smaller fragments and preempt the non-critical frames on the same physical link, even if they are in transition. This frame preemption scheme divides an egress port into two distinct interfaces based on the MAC layer:
preemptable MAC (pMAC) and
express MAC (eMAC) [
96]. The pMAC targets preemptable frames, while the eMAC targets preemptive frames. An incoming frame is mapped to only one egress interface according to the frame preemption status table, with the default option being the eMAC.
IEEE 802.3br introduces an optional sublayer called the MAC Merge sublayer, which attaches an eMAC and a pMAC to the PHY layer through a reconciliation sublayer [
174]. The PHY layer remains unaware of the preemption, while the MAC Merge sublayer and its MACs support frame preemption as defined in IEEE 802.1Qbu. The MAC Merge sublayer provides two approaches to manage the transmission of preemptable traffic alongside express traffic. One approach interrupts (preempts) the preemptable traffic currently being transmitted, while the other prevents preemptable traffic from being transmitted in the first place.
3.2.4 IEEE 802.1Qch Cyclic Queuing and Forwarding (CQF).
The IEEE 802.1Qch standard introduces the CQF mechanism, also known as the
Peristaltic Shaper (PS) [
127]. CQF is an efficient forwarding scheme proposed to simplify the design of a TSN switch, and it can deliver predictable and deterministic e2e latency [
101]. It is designed for limited-scale networks with time synchronization. Among the eight queues of a port of each switch, CQF reserves at least two queues performing enqueue and dequeue operations in a cyclic manner. Figure
5 shows an example of CQF operation on a chain topology with two switches SW1 and SW2. Time is divided into equal cycles with the length of
\(T\), which is delimited by the red vertical lines. During the first interval (i.e., cycle
\(x\)), frames
\(A\),
\(B\), and
\(C\) are sent out by end station ES1 and arrive at SW1, enqueuing them in
\(q_1\). In the following interval (i.e., cycle
\(x+1\)), these frames are dequeued and forwarded to SW2, stored in
\(q_1\). Meanwhile, another two frames
\(D\) and
\(E\) arrive at SW1, enqueued in another queue
\(q_2\). The operation repeats in each cycle. CQF can provide a deterministic e2e latency guarantee since it follows two principles. (1) The sending cycle of a frame on a switch and the receiving cycle on the subsequent switch are the same. (2) Any frame received by a switch on cycle
\(x\) must be sent out on the next cycle
\(x+1\). Thus, the e2e latency of a frame is determined by the routing path length and cycle size
\(T\).
The frame preemption scheme can also work together with CQF to shorten the cycle time of frame transmission, as the size of a frame fragment is smaller than that of a full frame. To make CQF work properly, all frame fragments must be received within the scheduled time cycle. Accordingly, to guarantee bounded and deterministic latency, it is crucial to carefully design the cycle length along the routing path. Due to its simplicity, CQF can be easily supported by extending a standard Ethernet switch with statically configured queues.
3.2.5 IEEE 802.1Qcr Asynchronous Traffic Shaping (ATS).
The TAS shaper can provide deterministic real-time communication in a TSN network but requires high-precision network-wide time synchronization. However, industrial networks may suffer from timing misalignment, such as drift or skew in timing signal frames, lost timing frames, and inaccuracy, which can cause asynchrony. This issue worsens with the increasing scale of the network [
172]. To address this, IEEE 802.1Qcr aims to smooth out traffic patterns by reshaping TSN streams per hop and prioritizing urgent traffic over non-deterministic traffic. The ATS shaper works asynchronously, not requiring synchronization on traffic transmission, and relies heavily on an
Urgency Based Scheduler (UBS). The UBS prioritizes urgent traffic by queuing and reshaping each individual frame at each hop. Asynchronicity is achieved through a
Token Bucket Emulation (TBE) and an interleaved shaping algorithm to eliminate burstiness. The TBE controls traffic by the average transmission rate but allows a small portion of burst traffic to occur. Figure
6 shows an example of an ATS shaper. The ATS shaper determines the traffic types at the ingress port for each incoming traffic. In the case of urgent traffic, it will be assigned to an urgent queue, which follows strict priority scheduling. For traditional high-priority scheduled traffic and low-priority best-effort queues, they follow a fair multiplexed transmission scheme.
Table
1 provides a summary of different TSN shapers. In the table, ‘Synchronization’ represents the network model, which can be either synchronous or asynchronous, and ‘
\(/\)’ indicates that it does not require time synchronization. ‘Main Tech’ refers to the main technology the shaper uses, e.g., TDMA. ‘Topology Dependence’ indicates whether the e2e latency is influenced by the adopted network topology. ‘Trigger’ represents the triggering mechanisms of the shaper.
3.3 Reliability
Ultra-high reliability is another fundamental QoS requirement for industrial critical traffic. To achieve this, TSN provides several mechanisms to exploit the spatial redundancy of the communication channel and transmit replicated frames through multiple channels to tolerate both permanent and temporary faults. For this purpose, several standards have been defined in TSN, including IEEE 802.1CB and IEEE 802.1Qca. The IEEE 802.1CB standard manages creating and eliminating frame replicas to be transmitted through the existing path(s), while IEEE 802.1Qca allows for creating and managing multiple paths between any pair of nodes in the network. Besides, the IEEE 802.1Qci standard defines frame filtering and policing operations.
3.3.1 IEEE 802.1CB Frame Replication and Elimination for Reliability (FRER).
The IEEE 802.1CB standard lowers packet loss probability by replicating transmitted packets, sending them on disjoint network paths, and reassembling replicas at the receiver. IEEE 802.1CB is a self-contained standard that guarantees reliable and robust communication among applications through proactive measures to tolerate frame losses. Specifically, IEEE 802.1CB includes features such as sequence numbering, replication of each packet in the source station and/or network relay components, transmission of duplicates across separate paths, and elimination of duplicates at the destination and/or other relay components. By sending duplicate copies of critical traffic across disjoint network paths, IEEE 802.1CB minimizes the impact of congestion and failures, such as cable breakdowns. The duplicates are eliminated based on the sequence numbers carried in the frames. To enhance robustness and cope with errors, such as those caused by a stuck transmitter repeatedly sending the same packet, a recovery function is defined to remove packets with repeated sequence numbers.
3.3.2 IEEE 802.1Qca Path Control and Reservation (PCR).
The IEEE 802.1Qca standard builds on two schemes: the
Type-Length-Value (TLV) extension and the
IS-IS (Intermediate System to Intermediate System) protocol. The TLV extension is based on the
Link State Protocol (LSP) of IETF, while the IS-IS protocol is used to establish connections among stations along the transmission path. This enables the IS-IS protocol to control bridged networks, extending the capabilities of the
shortest path bridging (SPB) to manage multiple routes on the network [
133]. IEEE 802.1Qca provides mechanisms for bandwidth allocation and improves redundancy through various methods, such as protection schemes based on multiple redundant trees, local protection for unicast data flows based on loop-free alternates, and restoration after topology changes (e.g., following a failure event).
3.3.3 IEEE 802.1Qci Per-Stream Filtering and Policing (PSFT).
The IEEE 802.1Qci standard defines protocols and procedures for filtering, policing, and service class selection on a per-stream basis. Filtering and policing functions include stream filters, stream gates, and flow meters to determine whether each frame is allowed to pass through to the egress queue. By setting up filtering rules and monitoring the passing frames, the standard can perform mitigation actions if violations are detected. Thus, IEEE 802.1Qci provides QoS protection when multiple streams share the same egress queue of a switch, preventing interference among them [
18]. In addition, it improves network security against DoS attacks by identifying and dropping unauthorized or malicious transmissions, enhancing network robustness.
3.4 Resource Management
Resource management is another key aspect of TSN to ensure the efficient allocation and utilization of network resources to meet the stringent requirements of industrial applications. It involves various mechanisms and protocols to manage network bandwidth, prioritize traffic, and maintain QoS through the definition of several standards, including IEEE 802.1Qcp, IEEE 802.1Qcc, and IEEE 802.1CS.
3.4.1 IEEE 802.1Qcp YANG Data Model.
IEEE 802.1Qcp defines a
YANG (Yet Another Next Generation) data model, specifying a data modeling language used to model configuration data and state data manipulated by network management protocols such as NETCONF and RESTCONF. Using the YANG model, IEEE 802.1Qcp allows configuration and status reporting based on
Unified Modeling Language (UML) to manage IEEE 802.1 bridge devices. YANG models the hierarchical organization of data as a tree, with each node representing configuration data, state data,
RPC (remote procedure call) operations, and notifications. A set of related data nodes are organized into a module, the primary building block of the YANG model [
22]. To simplify the maintenance and management of complex modules, each module can be further subdivided into submodules. The industry-wide implementation of the YANG model provides a universal interface to integrate resource management across diverse devices and equipment to fulfill the TSN standards.
3.4.2 IEEE 802.1Qcc SRP Enhancements and Performance Improvements.
The IEEE 802.1Qcc standard is an enhancement of the Stream Reservation Protocol (SRP) (IEEE 802.1Qat) and deals with the configuration of TSN networks. IEEE 802.1Qat, originally designed for CBS shaper, manages the registration and reservation of resources within each bridge (e.g., buffers and queues) along the traffic path between the talker and the listener. Specifically, it serves as an admission control protocol where the talker registers the sending traffic with the required bandwidth, and it will be granted permission or not, depending on resource availability. This enables QoS management for streams with specific latency and bandwidth requirements.
IEEE 802.1Qcc amends the IEEE 802.1Qat standard by extending the capabilities of SRP to adopt more complex shaping mechanisms, such as TAS with frame preemption. IEEE 802.1Qcc defines a
user-network interface (UNI), which provides an abstract functionality between end stations (i.e., user side) and bridges (i.e., network side). The high-level idea is that the user specifies the requirement for the streams they want to transmit without knowing all the details about the network, and the network analyzes this requirement along with network capabilities and configures the bridges to meet the user requirements. IEEE 802.1Qcc defines three configuration models [
129], as shown in Figure
7: the fully centralized model, the centralized network/distributed user model, and the fully distributed model. The fully centralized model introduces
Centralized User Configuration (CUC) as the centralized manager for end users and provides the user requirements to the CNC through UNI. In the centralized network/distributed user model, the CNC configures TSN elements according to user requirements provided by the end bridges connecting end stations through UNI. In the fully distributed model, there is no centralized network configuration entity, and the network is configured in a fully distributed manner.
3.4.3 IEEE 802.1CS Link-Local Reservation Protocol (LRP).
The IEEE 802.1CS standard facilitates the replication of a registration database within a network link, i.e., from the device at one end to the device at the other end of the link. This enhances communication regarding resource registration among point-to-point devices and enables dynamic discovery, registration, and management of resources at a local level. The current 802.1Q Multiple Registration Protocol supports databases up to 1500 bytes and significantly slows down when handling larger databases. To address this limitation, LRP is optimized to support the replication of registration databases on the order of 1 Mbyte. This enhancement enables new applications requiring much larger data sizes for configuration, registration, and reservation. LRP improves resource management efficiency since it operates within the local network segment without centralized management.
3.4.4 IEEE 802.1Qdd Resource Allocation Protocol (RAP).
IEEE 802.1Qdd defines RAP, which uses LRP from IEEE 802.1CS to support dynamic resource reservation for unicast and multicast streams in the fully distributed model. RAP also provides support for accurate latency calculation and reporting, and it is not limited to bridged networks. It aims to address issues present in the current IEEE 802.1Q
Multiple Stream Reservation Protocol (MSRP), which has limitations in terms of the number of reservations, admissions, and configuration size in distributed stream reservation scenarios [
92]. As of this writing, the standardization of RAP is still ongoing (IEEE P802.1Qdd Draft 0.9).
The advantages of TSN compared to existing industrial solutions. After detailing the major capabilities of TSN, here we summarize its advantages over the existing Ethernet-based fieldbus systems. These advantages include openness, interoperability, convergence, and performance. First of all, openness and standardization are crucial to industrial automation since they promote wide cooperation among industrial partners. TSN is an open and standardized IEEE technology that is unaffiliated to any organization or company, and thus, the major manufacturers are very active in promoting TSN. Second, TSN ensures vendor-independent interoperability among the industrial devices, avoiding vendor lock-in and enabling system-wide connectivity. The combination of OPC UA and TSN, described in the following section, further fulfills the communication all the way from the sensor to the cloud. Moreover, TSN enables the convergence of IT and OT, which were previously kept separate in traditional industrial Ethernet-based protocols. Breaking down the communication barriers between IT and OT makes accessing data from industrial subsystems easier, where different traffic types can coexist in the network with their specific QoS requirements being met. In addition to the above advantages, TSN also excels in performance. While some advanced Ethernet-based protocols, e.g., PROFINET IRT, can also achieve deterministic real-time performance, TSN surpasses these solutions in latency (cycle time below 50 microseconds), jitter (less than
\(\pm\)100 nanoseconds), and scalability (more than 10,000 network nodes) [
26]. Therefore, its openness, vendor-neutral interoperability, IT/OT integration support, and higher network performance, make TSN a highly effective and reliable choice for modern industrial automation.
4 Integrating TSN into Industrial Automation
In this section, we first detail the key benefits of TSN for industrial automation and highlight the opportunities for integrating TSN into industrial automation through potential system-level integration. We then elaborate on TSN traffic scheduling for achieving deterministic timing guarantees. At last, as a crucial step before deploying TSN in real fields, we discuss the importance of TSN testbeds, highlighting their role in validating TSN performance in real-world industrial environments.
4.1 Why Do We Need TSN in Industrial Automation?
TSN is a game-changing technological advancement based on Ethernet, and it is set to reshape the industrial communication landscape. This is mainly due to the many benefits offered by TSN to modern industrial automation networks, e.g., interoperability, convergence, and determinism.
As described in Section
2.1.1, the connectivity of industrial devices, i.e., interoperability, plays a critical role in industrial automation. At present, there are many tailored protocols and customized devices on the market for industrial Ethernet-based applications. While in many industrial application scenarios, customers may select different industrial Ethernet protocols to deploy their devices. This results in protocol incompatibility and leads to vendor lock-in, which leaves the customers with only two options. One is to purchase all their devices from the same vendor even though some are not their best choices. The other option is to purchase their devices from multiple vendors but develop a convertible solution to integrate the devices, e.g., by implementing gateways to adapt among various industrial Ethernet protocols. However, both options are costly and can limit innovation on the factory floor [
25]. Given the strength of TSN as an open IEEE standard, it guarantees compatibility at the network level among devices from different vendors. With TSN, a network consisting of multiple-vendor devices can inter-operate and be configured via a single standard interface. This provides customers with more options to build their system, avoids vendor lock-in, and enables connectivity across systems. The standardized network structure also leads to a lower cost of ownership since the customers only need to replace existing switches with TSN switches instead of duplicating networks and maintaining the additional hardware and software.
The IT/OT integration, accelerated by the rapid development of advanced manufacturing, acts as another critical enabler in the automation industry [
107]. In legacy industrial Ethernet-based networks, different communication needs for IT and OT hinder the integration of these two fields. Specifically, a larger bandwidth is typically required for data communication in the IT fields, while deterministic performance is the key for OT involving control operations. On the other hand, the digitization trend of industrial automation requires all types of data information (e.g., analog signals, sounds, images, and texts) must be converged. To this end, TSN provides the capability to break down communication barriers between various subsystems, including critical and non-critical systems. Different traffic types can coexist and be transmitted over the same network with no impact on traffic with a higher criticality level from traffic with lower priority. Network convergence provided by TSN makes it easier to access data from industrial systems and send them to the enterprise systems over standard Ethernet or the other way around without the need for gateways.
Despite handling various traffic types across numerous devices in such converged networks, TSN can still provide deterministic performance guarantees, especially for critical traffic. TSN ensures that the timing of critical traffic is predictable and consistent, which is essential for industrial automation applications. With deterministic message delivery, devices can communicate in real time, simplifying the configuration of systems, devices, and applications and increasing productivity by enabling the machines to run cooperatively rather than independently. Informed decision-making by humans or other machines can also be processed in real time. This benefit of deterministic communication is achieved through TSN traffic scheduling based on network-wide time synchronization, which will be elaborated in Section
4.3.
4.2 TSN-based Converged Industrial Networks
TSN standardizes a set of technologies within the framework of IEEE 802.1 to provide guaranteed QoS. It is worth noting that TSN only resides at Layer 2 of the OSI model, i.e., it aims to provide bounded latency and jitter for point-to-point communication. Thus, TSN is not a complete communication protocol but rather can be taken as a building block to provide the determinism foundation for converged industrial networks and it needs to be used in combination with higher-layer protocols to provide end-to-end QoS guarantee. On the other hand, industrial automation requires the Ethernet to support the convergence of all kinds of networks and traffic types typically found in an industrial setting.
Converged networks in industrial settings require flexibility and scalability to use the same infrastructure (including small devices like sensor nodes, machine, and production line control devices, as well as big devices like data servers) for concurrent transmission of deterministic real-time communication (e.g., OT traffic) and non-deterministic best-effort communication (e.g., IT traffic). TSN is deemed as a key enabling technology to establish converged industrial networks with the following two trends [
138]: (1) Fieldbus
5 over TSN, and (2) OPC UA over TSN. Table
2 gives a summary of representative TSN-based converged industrial network solutions. Their details are described below.
4.2.1 Fieldbus over TSN.
At present, the industrial communication market is still dominated by Ethernet-based fieldbus systems, and there are many different fieldbus solutions in the market, e.g., PROFINET, EtherNet/IP, EtherCAT, Powerlink, and CC-Link. A major obstacle for today’s Ethernet-based fieldbus systems is that they do not fulfill the convergence requirement of emerging industrial automation applications (e.g., a close IT/OT integration). Thus, combining industrial fieldbuses with TSN provides a way that can accomplish such requirements. There exist two main approaches for transmitting industrial fieldbus communication over TSN. One approach is to set up a new TSN network in accordance with every specification of the newly defined IEEE standards over Layer 1 and Layer 2 of OSI in factory networks so that fieldbuses can be transmitted without alternation. The other approach is to install active network gateways to convert all other network traffic between them to TSN-compatible Ethernet frames [
138].
Many fieldbus providers are already offering their products mapped to TSN, enabling seamless integration. For example, PROFINET over TSN [
99] makes use of TSN features and supplements PROFINET on the Ethernet layer with IEEE standardized counterparts. With TSN, PROFINET is standing on a robust and future-proven foundation, which in turn creates more planning reliability for production and industrial solutions. On the other hand, existing PROFINET services (e.g., diagnostics and parameterization) and profiles (e.g., PROFIsafe, PROFIenergy, PROFIdrive) work as before on top of PROFINET over TSN and do not require any changes from the user.
EtherCAT over TSN [
145] defines a seamless adaptation to use both technologies and capitalize on their respective advantages without requiring any changes to the EtherCAT slaves. Adding EtherCAT segments as structuring elements in TSN reduces the complexity in backbones by using shared frames for a group of slaves and enabling internal configuration for a machine. TSN will protect EtherCAT segments from unwanted traffic while increasing the efficiency of the combined EtherCAT-TSN system. Combined EtherCAT and TSN can enhance flexibility at the automation cell level while maintaining total control of the various automation tasks.
ODVA, which is a standards development organization and membership association, presents a recommended high-level approach for incorporating TSN capability into EtherNet/IP and identifies several major technical aspects of EtherNet/IP over TSN [
60]. TSN will be introduced in ODVA technologies as an optional and backward-compatible Data Link Layer for the EtherNet/IP implementation of
CIP (Common Industrial Protocol).
CC-Link IE TSN [
32] is an open industrial network utilizing TSN to seamlessly connect information systems to production sites. With TSN, CC-Link IE TSN is able to increase openness while further strengthening performance and functionality. In addition to the above solutions with individual fieldbus systems, [
34] designs a hybrid wired/wireless protocol conversion module that can realize intercommunication of three industrial Ethernet such as PROFINET, EtherCAT, and Ethernet/IP, and proposes a TSN-compatible frame to communicate with TSN based gateway.
4.2.2 OPC UA over TSN.
Today’s proprietary Ethernet-based fieldbus systems are broadly applied across different industrial automation networks to meet specific topology requirements, communication speeds, or latency guarantees. However, these communication protocols are often incompatible, resulting in fragmented networks that cannot seamlessly communicate with each other. OPC UA [
70] was developed to solve this problem by allowing industrial devices operating with different protocols and on different platforms (e.g., Windows, Mac, or Linux) to communicate with each other. OPC UA supports two communication models, client-server (point-to-point communication based on TCP/IP) and publisher-subscribers (one-to-many communication supported by the new PubSub extension), without real-time capability. Thus, in conjunction with TSN, OPC UA over TSN under the pub/sub communication model allows deterministic transmission of real-time data and offers the flexibility and openness inherent to OPC UA [
131]. Note that, OPC UA over TSN and the above discussed fieldbus over TSN systems clearly overlap, but they are not replacing each other but will likely coexist for a long while. This is mainly due to the following fact. The strength of OPC UA, with real-time communication enabled by TSN, is that it allows different networks to communicate, especially at the factory- and enterprise-level. Industrial Ethernet, on the other hand, is primarily designed for communication between field devices and controllers. Below, we briefly discuss some OPC UA over TSN solutions.
[
72] proposes a communication architecture using the OPC UA and TSN for manufacturing systems. The proposed OPC UA TSN is a two-tier communication architecture, including the upper factory-edge tier and the lower edge-field tier. TSN is adopted as the communication backbone to connect different control subsystems in the field layer and the entities of the upper layers. OPC UA is adopted to realize horizontal and vertical information exchange between the entities of each layer. [
98] presents an OPC UA PubSub over TSN, which enables TSN to be used for the transport of OPC UA PubSub messages in practice. In the proposed approach, the message for the publisher is prepared in a (hardware-triggered) interrupt to ensure short delays and small jitter. Specific modifications are performed to allow the interaction between a best-effort standard OPC UA server and a real-time OPC UA PubSub publisher with access to a shared information model. The approach was implemented in open source based on the open62541 OPC UA SDK. [
55] presents a case study on a TSN-enabled OPC UA integration for a field device. The evaluation indicates that the OPC UA integration of the field devices can be implemented using COTS software and hardware components. These R&D efforts validate the potential of OPC UA TSN as a vendor-independent successor technology. OPC UA TSN is expected to quickly reveal itself as a game changer in the field of industrial automation, becoming the promising candidate to establish a holistic communication infrastructure from the sensor to the cloud [
26].
4.3 Traffic Scheduling
As described in Section
3, the TSN TG has developed a suite of traffic shapers in the TSN standards, including TAS, CBS, PS, and ATS (see the summary in Table
1). These shapers provide a toolkit for managing network traffic to meet the diverse timing requirements. Among these shapers, TAS stands out and draws special attention due to its ability to achieve deterministic timing guarantees by leveraging network-wide synchronization and time-triggered traffic scheduling mechanisms [
171], making it a key enabler to support deterministic real-time traffic in industrial automation.
A TSN switch is equipped with a set of time-gated queues to buffer frames from different traffic flows, and the control of the queues is specified by a predefined GCL. In addition, the priority filter in each switch utilizes a 3-bit Priority Code Point (PCP) field in the packet header to identify the stream priority and directs incoming traffic to the specific egress queue according to the priority-to-queue mapping. The configuration of GCL and traffic-to-queue mapping together define the network-wide schedule, which is determined by CNC and deployed on individual switches to guarantee the timing requirements of all time-triggered traffic. Traffic scheduling is thus one of the most critical problems in TSN, resulting in a large amount of research effort to develop various novel scheduling methods.
Industrial applications that employ TSN as the communication fabric can be diverse regarding traffic patterns, network topology, deployment environment, and QoS requirements. Consequently, the specific TSN scheduling problem to be studied may vary significantly from the perspectives of the network model, traffic model, and scheduling model.
—
The network model defines key attributes of the directed logical links in TSN, such as the propagation delay on Ethernet cables, processing delay on switches, link rate, number of available queues, and maximum GCL length. These parameters are typically determined by the capacity of the TSN switch or end station connected to each link.
—
The traffic model defines the parameters characterizing each TSN flow, including release time, period, payload size, deadline, and jitter. Each parameter can be individually modeled to capture the targeted traffic type based on specific industrial application scenarios. For example, the traffic model can be classified into fully scheduled or partially schedulable traffic, depending on whether the release time of flows is predefined or determined by the corresponding talker. Additionally, based on jitter requirements, the traffic model can be categorized as a zero-jitter model or a jitter-allowed model.
—
The scheduling model specifies the constraints on the TSN system, including queuing delay, scheduling entity, routing and scheduling co-design, fragmentation, and preemption. For instance, based on assumptions regarding queuing delay, scheduling models can be classified into no-wait and wait-allowed models. The scheduling entity determines whether the model is frame-based or window-based. Furthermore, depending on whether the routing path of each traffic flow is predefined or needs to be determined, scheduling models can be categorized as fixed routing models and joint routing and scheduling models.
Based on the above TSN model categorization, in a most recent TSN survey [
155], we present a systematic review and experimental study on 17 representative TAS-based TSN scheduling methods comparing their performance using various metrics.
6 This work offers comprehensive experimental comparisons among selected scheduling methods, including a diverse set of TSN system models and algorithms focusing on real-time scheduling of time-triggered traffic. The comparison results demonstrate that there is no one-size-fits-all scheduling method that can achieve dominating performance in all scenarios. Furthermore, diverse experimental settings complicate the fair evaluation of scheduling methods without introducing bias, which can make conclusions from previous studies only valid under specific settings. These findings also validate the inherent complexity of TSN traffic scheduling which is still an open problem.
4.4 TSN Testbeds
With all the benefits of TSN for industrial automation, before its deployment in real-world industrial sites, a crucial step is to validate its performance on ensuring all the stringent requirements posed by industrial automation applications. In general, three primary methods are used for evaluating TSN protocols and systems: theoretical analysis, simulation, and hardware testbeds [
132]. Many theoretical analysis frameworks have been developed to evaluate TSN, e.g., [
58,
79,
159]. However, these analysis frameworks make certain assumptions and abstract the behaviors of TSN systems compared to real-world settings. Simulation-based evaluation is another popular option, and simulation tools, e.g., OMNeT++ and NS-3, have been widely used in TSN research [
38,
45,
93]. The advantages of simulations include flexibility, reduced cost, and scalability. However, they do not involve real hardware components, making it impossible to showcase the applicability in real industrial settings. Thus, a high-fidelity way is to use a dedicated physical testbed based on real hardware to conduct well-defined experiments.
Physical testbeds offer many benefits to the design and evaluation of TSN systems, enabling researchers and developers to explore, validate, and optimize their TSN solutions. The solutions can be rigorously evaluated in a controlled environment, ensuring that they meet the stringent industrial requirements. TSN testbeds also facilitate the assessment of interoperability between devices from different vendors. In addition, they help identify and address network configuration challenges and cybersecurity vulnerabilities, thereby mitigating deployment risks and ensuring a smooth transition to TSN-enabled industrial networks. However, the development of a TSN testbed is challenging from different points of view, ranging from implementation costs, sharing capability, and fidelity. Moreover, replicating real-world industrial conditions in a controlled testbed environment is difficult, and the cost and resource requirements, including specialized hardware, software, and skilled personnel, can be significant.
Since TSN is a family of standards, TSN-related testbeds can be built to study different TSN aspects, including traffic scheduling, packet processing, communication over-the-air, performance measurement, and network configuration. There have been a number of TSN testbeds developed for industrial applications and they can be generally classified into (1) general TSN testbeds, (2) OPC UA TSN testbeds, and (3) wireless TSN testbeds. General TSN testbeds (e.g., [
40,
102,
132]) focus on the fundamental TSN functions, e.g., scheduled traffic, credit based shaper, and time synchronization, to achieve real-time communication and deterministic behavior. OPC UA TSN testbeds (e.g., [
26,
109]) evaluate the integration of OPC UA and TSN to ensure the seamless flow of information among devices from multiple vendors. Wireless TSN testbeds (e.g., [
65,
123]) are built to explore the possibility of extending TSN capabilities to wireless media, including Wi-Fi and 5G. We will discuss the opportunities of wireless TSN in Section
6.3, and readers can refer to [
169] for more details on the current TSN-related testbeds.
5 Challenges
This section summarizes a number of challenges inherent to TSN standards that should be addressed. We follow the structure of Section
3 to discuss the specific challenges associated with each of the four pillars, i.e., time synchronization, latency guarantee, reliability, and resource management.
5.1 Time Synchronization
Network-wide time synchronization is the foundation of all TSN features aimed at achieving deterministic real-time communication. IEEE 802.1AS is defined within TSN to provide accurate time synchronization using the gPTP protocol as described in Section
3.1. In the following, we discuss several key challenges that impact the accuracy and reliability of time synchronization, e.g., fault tolerance, synchronization overhead, and multi-level hierarchy.
One of the primary challenges in TSN is to maintain precise synchronization across all network devices when applying the master-slave-based gPTP protocol. In a multi-hop TSN network, synchronization errors can occur, leading to synchronization failures [
97]. These errors include time value error, i.e., incorrect time-related information (e.g., timestamp error) carried in propagated messages between nodes, and asymmetry in network delay, where the time difference between transmission delays from master to slave and vice versa causes errors [
139]. Clock drifts, due to the frequency drift of crystal oscillators, can cause gradual deviation of time clocks in various nodes over time, resulting in synchronization errors. In addition, security attacks, where compromised devices in the synchronization spanning tree propagate erroneous time information, can also lead to accumulated errors and synchronization failures.
To enhance resilience to synchronization failures, IEEE802.1AS only provides a basic level of redundancy, relying on
BMCA (Best Master Clock Algorithm) to switch to a new
Grandmaster (GM). To address this problem, IEEE P802.1ASdm [
1] defines a hot standby mechanism to maintain two time domains simultaneously without relying on BMCA [
156]. While, addressing synchronization failures may require additional frequent message exchanges on timing information, consuming communication bandwidth and potentially causing back pressure on the centralized control plane, especially in large-scale applications [
86]. A trade-off between the synchronization accuracy and incurred overhead should be investigated where the settings of sync messages (e.g., transmission period) can be optimized.
Moreover, industrial automation networks introduce further complexity with multi-level hierarchies on network switches, where different hierarchies may have varied synchronization quality. Since TSN standards operate at the MAC layer, even slight time slips in the upper layer can significantly affect the lower layer. The heterogeneity and accuracy differences among connected devices make a fully centralized time synchronization solution difficult to achieve in large-scale industrial automation. Therefore, applying a time synchronization scheme in industrial automation requires consideration of both network hierarchy and topology, which impacts the propagation mechanism of the synchronization messages.
5.2 Latency Guarantee
In TSN, low latency guarantees are typically achieved through well-designed flow control, which includes traffic shaping and flow scheduling. Traffic shaping relies on various TSN shapers, each defining the traffic forwarding mechanism on TSN switches. Flow scheduling generates a network-wide schedule deployed on each device, specifying the timing of every transmitted frame. Building on the various TSN shapers introduced in Section
3.2, this section focuses on discussing the key challenges associated with each TSN shaper.
5.2.1 IEEE 802.1Qbv.
Although the key idea of IEEE 802.1Qbv Time-Aware Shaper (TAS) mechanism is rather simple, there is an inherent complexity in generating the GCLs, i.e., deciding the right time instances to open and close the gates. This complexity is due to the NP-completeness of the TSN scheduling problem [
74], and thus, no polynomial time scheduling algorithm exists unless
\(P=NP\). To this end, many TAS-based scheduling methods have been developed, and these solutions can be classified into two categories. The first class aims to construct specialized search algorithms, i.e., by developing heuristics, meta-heuristics, or genetic algorithms (e.g.,
ant colony optimization (ACO) [
49] and meta-heuristics search algorithms [
7]). The second class leverages general-purpose tools, such as
integer linear programming (ILP) [
87] or
satisfiability modulo theories (SMT) solvers [
36] to find the exact solutions.
The primary challenge of generating TAS-based schedules is how to manage the trade-off between efficiency and precision. This trade-off arises from two main considerations. First, the choice of scheduling models – such as whether to allow flow preemption, frame fragmentation, and whether to generate the schedule and routing path jointly – impacts this balance. Using a more complex scheduling model, i.e., enabling the above options, can theoretically enhance system schedulability (i.e., the number of scheduled flows in the system) since it provides a larger search space. However, this also incurs higher computational overhead, which can be counterproductive in practice, especially in resource-constrained systems where a feasible schedule cannot be found by the algorithm in a reasonable amount of time. Another consideration for the trade-off is the choice of scheduling method category, i.e., heuristics or exact solutions. Specifically, heuristic algorithms demonstrate higher efficiency, particularly in large-scale networks, but they may not be able to find any feasible schedule in many cases. On the other hand, an exact algorithm can always find a feasible solution (if it exists) to exhibit superior schedulability performance in small-scale networks.
Besides the precise configuration of switches, the TAS shaper imposes high performance requirements on end stations where it requires the co-design of TSN end stations and gate scheduling on switches to schedule the e2e frame transmissions. Many commercial TSN switch products (e.g., TTTech Evaluation Board [
64] and Cisco Industrial Ethernet 4000 Switch [
35]) can support real-time and high-throughput (e.g., 1 Gbps) traffic with microseconds-level precision. However, the design of real-time TSN-compatible end station is much more challenging and remains an open problem [
68,
153]. Another notable challenge of TAS-based scheduling is the co-scheduling of time-triggered (TT) traffic and synchronization traffic. If transmission collision between the two traffic types occurs, it can cause synchronization error out of bound, resulting in network failure or deadline miss of TT traffic.
5.2.2 IEEE 802.1Qbu.
IEEE 802.1Qbu Frame Preemption is beneficial to achieve bounded low latency, especially for critical traffic by preempting the transmission of non-critical traffic. The standard, however, only defines a one-level frame preemption paradigm where frames are classified into express frames or preemptable frames, depending on the criticality of the frames. While one-level preemption can ensure the transmission of high-priority critical traffic to some extent and is relatively simple to implement, it suffers from low flexibility since frames of the same category cannot preempt each other. To address this issue, some studies (e.g., [
91]) have proposed the concept of multi-level preemption. By introducing more frame categories, multi-level preemption allows for finer-grained preemption between frames. This approach enhances flexibility and can more effectively reduce frame latency. However, it also significantly increases the configuration complexity. For applications requiring deterministic real-time performance, the worst-case analysis of a multi-level preemption TSN network becomes highly complicated.
TSN supports the concurrent operation of multiple shapers (e.g., TAS and CBS) on the same egress port, and thus utilizing frame preemption in such complex TSN setups can bring many benefits [
39]. However, considering that the generation of the GCL is already an NP-hard problem, as described in Section
5.2.1, the use of frame preemption on combined TSN shapers would further elevate the difficulty and complexity of the configuration. Without highly effective and efficient traffic scheduling and configuration methods, combining so many functions could have adverse effects, such as incorrect configurations that fail to ensure timing correctness [
12].
Since each occurrence of preemption divides the frame transmission into more segments, additional context switching is required. Therefore, the overhead introduced by preemption is another crucial consideration. Specifically, each preemption incurs a fixed overhead of 12 bytes, as well as the
InterFrame Gap (IFG) of 12 bytes required between two consecutive transmissions [
90]. Moreover, when considering multi-level preemption, each preemption level introduces additional hardware implementation overheads. Thus, although the benefits of preemption are evident, addressing the trade-off between the performance gains from frame preemption and the associated overhead presents a significant challenge.
5.2.3 Other Shapers.
The CBS shaper avoids starvation for best-effort flows at the expense of the transmission delay of higher priority and presumably more critical flows [
19]. Although CBS is straightforward to implement, networks applying CBS are complex in analyzing the timing performance. In addition, TSN networks with high-volume traffic may suffer from poor performance under CBS in terms of delay guarantee [
118]. The PS shaper coordinates operations for both enqueue and dequeue processes, ensuring that all frames are transmitted exactly within their designated time slots. This strict timing requirement means that PS shapers necessitate precise alignment of cycle times, making them less adaptable to asynchronous networks. On the other hand, the ATS shaper aims to achieve bounded low latency for mixed-type traffic without global time synchronization. ATS provides less determinism for critical traffic than TAS but ensures a better average latency of all streams, as evaluated in [
173]. However, the current formula of ATS delay bound is rather conservative, where more precise timing analysis is required.
While TSN defines various shapers that can provide real-time deterministic performance for critical traffic, this is usually based on the assumption of a homogeneous network where all devices support these shapers, and there is global network time synchronization. However, industrial automation systems typically include a variety of devices, e.g., PLCs and other legacy equipment. TSN’s vendor-independent interoperability feature allows for the existence of such heterogeneous networks within industrial systems. In heterogeneous networks with unscheduled and/or unsynchronized devices, meeting timing requirements remains a significant challenge. Designing effective scheduling mechanisms and timing analysis methods is essential to address this issue. These mechanisms need to ensure that even in the presence of diverse device capabilities and synchronization states, the network can still meet the stringent timing requirements of critical traffic [
17].
5.3 Reliability
TSN enhances the reliability of industrial networks through several standardization efforts, including IEEE 802.1CB, IEEE 802.1Qca, and IEEE 802.1Qci, as described in Section
3.3. However, these standards do not specify the exact implementation methods, leaving many research questions on fault tolerance to improve TSN reliability. In general, enhancing TSN reliability involves providing transmission redundancy, at both space and time dimensions.
TSN standards typically use space redundancy. Specifically, IEEE 802.1Qca allows the creation of multiple paths between talkers and listeners for communication, while IEEE 802.1CB defines how to send duplicate traffic frames over different paths and eliminate redundant copies at the destination. This approach is well-suited for handling permanent faults, such as link breaks. The number of faults that can be tolerated depends on the number of redundant paths created [
9]. However, space redundancy consumes significant network resources since the redundant paths are typically pre-established with bandwidth pre-allocated, regardless of whether faults occur during the operation. In addition, configuring multiple redundant paths and frame copies increases the complexity of network scheduling.
In contrast, time redundancy based on retransmission is more cost-effective. It creates multiple redundant copies of individual frames over time for retransmission. Unlike space redundancy, time redundancy is better suited for handling transient faults, e.g., packet loss and data error, which may result in incorrect reception and compromised data integrity [
162]. The efficiency of time redundancy is also evident in its ability to differentiate the fault probabilities between different links. Indeed, the possibility of faults varies among links due to their physical characteristics. Therefore, time redundancy can allocate a different number of retransmissions for transmissions over different hops based on this information. Research in this area primarily focuses on how to meet reliability requirements, e.g., transmission success rates, with the minimum number of retransmissions [
48].
However, both space redundancy and time redundancy methods introduce additional network resource overhead, inevitably impacting other system performance, e.g., schedulability. To further improve resource utilization, adopting resource-sharing methods to provide redundancy is also effective [
80,
83]. For example, in space redundancy methods, multiple paths can share one or more links, where partially disjoint paths can result in duplicate frames at intersection switches. In time redundancy methods, multiple traffic flows can share some time slots for retransmissions [
166]. However, these resource sharing methods must involve precise analysis of transmission success probabilities by considering various potential transmission scenarios, which poses a great research challenge. An alternative approach to avoiding these highly complex analyses is to use learning-based methods, e.g., federated learning [
48], to protect a network with probabilistic link failures.
It is also crucial to make TSN resilient to adversarial attacks. TSN addresses this by defining IEEE 802.1Qci, which provides QoS protection through traffic suppression and blocking. 802.1Qci performs per-stream filtering and policing to protect against unnecessary bandwidth consumption, burst sizes, and malicious or improperly configured endpoints [
151]. It can also be used to confine network faults to specific areas, minimizing the impact on other parts of the network [
78]. Although 802.1Qci is a published standard, there has been little research on deploying the standard on industrial network devices. One major challenge is how to configure the policing and filtering mechanisms of 802.1Qci, as misconfigurations can result in legitimate packets being filtered out or malicious packets being forwarded [
44], which degrades the network reliability and resilience.
5.4 Resource Management
Resource management is essential for provisioning and managing network resources in TSN. It can significantly impact network performance across various aspects, including network deployment, network configuration, traffic scheduling/routing, fault recovery, and network security. TSN primarily relies on the IEEE 802.1Qcc standard for resource management, complemented by the YANG model defined in IEEE 802.1Qcp, which provides a unified data template for network device configuration.
.1Qcc provides a set of tools for globally managing and reconfiguring the network, specifying three configuration models with regards to their architecture, as described in Section
3.4.2. In general, each model
7 has its strengths and weaknesses, and no single model is applicable to all industrial scenarios [
115]. The centralized model controls and manages traffic flows across the entire network, offering precise configuration and reconfiguration to meet timing and reliability requirements due to its global network knowledge [
165]. However, this model has several flaws. The reliance on a single centralized controller makes the network vulnerable; if the controller fails, the network must maintain its current configuration and operating status until the controller is restored, rendering it unable to respond to network dynamics (e.g., adding new traffic) or failures. In addition, centralized models suffer from poor scalability. In large-scale networks, their response times can be considerably large due to reliance on the CNC and multicast broadcasting mechanisms to handle various network dynamics [
164]. Furthermore, since a large amount of the computational workload is concentrated on the centralized controller, its computational performance can become a bottleneck for the entire network. On the other hand, the distributed model avoids the added complexity and single point of failure associated with centralized management and provides a much faster response to network dynamics since it does not require extensive configuration information exchange across the entire network. However, compared to centralized methods, it has slow network convergence and may result in transmission collisions, thus falling short of the network performance compared to those achieved by centralized methods. Therefore, selecting the appropriate resource management model and specific configuration methods based on the particular industrial application scenario and the corresponding application QoS requirements is a significant challenge. This decision must balance the trade-offs between complexity, responsiveness, scalability, and performance to ensure optimal network operation tailored to the unique demands of each industrial setting.
Although IEEE 802.1Qcc is a published standard, the specified functions of the introduced CNC and CUC are not clearly defined. The implementation of the communication interface UNI between these TSN elements also needs further study. To this end, an ongoing standard, IEEE P802.1Qdj [
2], specifies enhancements to the UNI to include new capabilities to support bridges and end stations to extend the configuration capability. It also clarifies the functions of CNC and CUC, and stipulates the YANG model used for the communication between CNC and CUC. However, there is very limited research on these standards, leaving many challenging issues to be studied, e.g., the selection of appropriate resource management protocol among many candidates, including NETCONF, CORECONF, and RESTCONF [
21].
Furthermore, enabling efficient and effective network reconfiguration in response to various TSN network dynamics is a challenging task. For efficiency, industrial automation requires on-the-fly control and configuration to handle network dynamics without causing system downtime [
33]. This requires to avoid complex reconfiguration algorithms, e.g., SMT-based solutions, which require a long time to solve. For effectiveness, online reconfiguration must still meet stringent QoS requirements, particularly timing guarantees for critical traffic, even during dynamic adjustments. In this regard, centralized methods have their advantage since they have global network information. However, given the complexity of GCL configuration and routing determination, this remains a highly challenging problem.
Industrial automation systems may involve legacy or off-the-shelf end systems (e.g., PLC) that are unscheduled and/or unsynchronized. Dynamic reconfiguration for such heterogeneous TSN networks introduces another level of complexity since the TSN flows need to pass through the non-TSN network [
103]. This brings significant uncertainty to latency and jitter, requiring precise timing analysis to preserve the determinism of critical flows.
7 Conclusion
The industrial automation market is still dominated by Ethernet-based fieldbus systems, particularly those with real-time capabilities, e.g., EtherCAT, PROFINET IRT, POWERLINK, and SERCOS III. Although these technologies are based on conventional Ethernet, they are not designed to interoperate with fieldbus from other vendors. In the context of industrial automation, a large number of vendor-crossing devices with diverse QoS requirements are expected to communicate across all levels of the automation pyramid. Thus, TSN has the potential to enable modern industrial automation by establishing universal physical and data-link layer standards. TSN consists of a set of Ethernet-based protocols and standards designed to address a wide range of practical industrial use cases with guaranteed timing requirements in heterogeneous networks. TSN encompasses a broad scope, making it critical to understand the standards systematically rather than focusing on just one characteristic or component. This paper provides a comprehensive review of TSN standards in industrial automation, including both published standards and in-progress drafts. We specifically focus on the automation industry, discussing the challenges and opportunities when applying TSN to industrial control applications. In addition, we highlight promising research directions for TSN design and development in industrial automation, such as optimizing current TSN standards and integrating TSN with other technologies.