Master failures in the Precision Time Protocol

Stefano Rinaldi

Master failures in the Precision Time Protocol

2008

If all clocks within a distributed system share the same notion of time, the application domain can gain several advantages. Among those is the possibility to implement real-time behavior, accurate time stamping, and event detection. However, with the wide spread application of clock synchronization another topic has to be taken into consideration: the fault tolerance. The well known clock synchronization protocol IEEE1588 (precision time protocol, PTP), is based on a master/slave principle, which has one severe disadvantage. This disadvantage is the fact that the failure of a master automatically requires the re-election of a new master. The start of a master election based on timeout and thus takes a certain time span during which the clocks are not synchronized and thus running freely. Moreover the usage of a new master also requires new delay measurements, which prolong the time of uncertainty as well. This paper analyzes the results of such a master failure and proposes democratic master groups instead of hot-stand-by masters to overcome this problem by. It is shown by means of simulation that the proposed solution will not deteriorate the accuracy of the slave clocks in case of a master failure....Read more

Master Failures in the Precision Time Protocol Georg Gaderer 1 , Stefano Rinaldi 2 , and Nikolaus Kerö 3 1 Research Unit for Integrated Sensor Systems, Austrian Academy of Sciences 2 University of Brescia 3 Oregano Systems Design and Consulting GmbH Abstract – If all clocks within a distributed system share the same notion of time, the application domain can gain several advantages. Among those is the possibility to implement real-time behavior, accurate time stamping, and event detection. However, with the wide spread application of clock synchronization another topic has to be taken into consideration: the fault tolerance. The well known clock synchronization protocol IEEE1588 (Precision Time Protocol, PTP), is based on a master/slave principle, which has one severe disadvantage. This disadvantage is the fact that the failure of a master automatically requires the re-election of a new master. The start of a master election based on timeout and thus takes a certain time span during which the clocks are not synchronized and thus running freely. Moreover the usage of a new master also requires new delay measurements, which prolong the time of uncertainty as well. This paper analyzes the results of such a master failure and proposes democratic master groups instead of hot-stand-by masters to overcome this problem by. It is shown by means of simulation that the proposed solution will not deteriorate the accuracy of the slave clocks in case of a master failure. Keywords – Fault tolerance, Clock Synchronization, Computer Networks, IEEE Keywords INTRODUCTION Clocks 1 representing the same notion of time have many advantages in distributed systems, the most obvious being the possibility to set coordinated actions such as synchronized communication. This can be used to establish real-time, which compulsory for TDMA schemes. Another application for synchronized clocks is the identification, ordering, and quantization in terms of timing of events in a distributed system. Again, the applications for this approach are wide spread; one very famous are the LAN eXtensions for Instrumentation (LXI) [1,2], where test and measurement devices are synchronized over Ethernet in order to conveniently setup a-posteriori triggering. Approaches to reach these synchronicity requirements are well observed and take usually advantage of communication networks. Synchronization is done by periodically exchanged messages to align the clock w. r. t. each other. Synchronizing clocks this way is often used, as for example in the internet- standard Network Time Protocol (NTP), or in the more accurate IEEE1588 [2] Precision Time Protocol (PTP) standard. 1 The work presented in this paper is partly funded by the European Fund for Regional Development (EFRE) PTP is based on a master/slave principle in a way that once a master, which has been previously elected synchronizes its slaves via multicast messages 2 . However, for a considerable number of applications even a temporary failure of the clock synchronization is by no means acceptable. The PTP protocol handles recovery from a failure by means of providing the so called best master clock (BMC) algorithm; during this phase all slaves within a synchronization (or multicast) domain remain with free running unsynchronized clocks, yet electing a new master. This paper proposes an approach, where multiple masters are tied together to a so-called mastergroup, where one or more masters may fail without any of the nodes noticing the failure, thus the synchronization accuracy will no be deteriorated. The remainder of this paper is structured as follows: After an analysis of the state of the art, namely the master election process in IEEE1588, the approach to synchronize within the group is elaborated and the proposed system shown within a simulation experiment. Finally a conclusion will round up the paper and give an outlook for future research. STATE OF THE ART State of the art clock synchronization techniques can use two different paradigms: the master/slaved based principle and the democratic approach. The first method elects one dedicated master in order to synchronize all other nodes. In opposite to that, democratic algorithms use the clocks of several nodes, which are then combined to an agreed clock value. Obviously both approaches have advantages and disadvantages; master/slave based clock synchronization is easy to implement and to debug, the dependencies within an environment running such a protocol are simple. On the other hand, democratic approaches are adding a certain degree of complexity to a system but have the advantage that they can offer fault tolerance. Faulty or malfunctioning clocks can be sorted out without sacrificing even short-term accuracy. A. IEEE 1588 The basics of IEEE1588-2002 and version 2008 are well specified of course in the respective standard document [2]. However, secondary literature is available as well, giving an overview [3]. As this paper focuses on the fault case that a 2 Version 2008 of the IEEE1588 standard (approved at the time of submission of this paper) allows synchronization via unicast messages as well. ISPCS 2008 – International IEEE Symposium on Precision Clock Synchronization for Measurement, Control and Communication Ann Arbor, Michigan, September 22–26, 2008 978-1-4244-2275-3/08/$25.00 ©2008 IEEE 59

master’s clock is considered, as wrong, special attention has to be given to the following master election process. Figure 1: State transition diagram of the PTP protocol The standard itself defines for this master election a state collapsed transition diagram rather than a flattened state diagram. A simple election case as shown in figure 1, which is derived from the PTP specification. It is evident from this figure that, in case a master fails, a node formerly in slave mode has to wait at least one synchronization interval to fallback in listening (L i,j ) state and then two additional synchronization timeouts until a new master can be elected. The unsynchronized time is therefore with the IEEE1588 default synchronization interval (2s), 6 seconds. This estimation does however not cover the time of the servo to recover from the time without control information. The situation is even worth because the slave node has to wait some additional, randomized time until it can initiate a delay measurement. Until this time is reached only a guess of the line delay can be taken into account. B. Democratic algorithms The principle of democratic clock synchronization algorithms is pretty much in contrast to the approach of IEEE1588: In general this type of synchronization approach considers all clocks rather than the value of a single dedicated master. This technique is implemented in a fashion that all democratic synchronizers broadcast their clock values at predefined times. A so called convergence function is then used in sequel in order to find an agreement on the common time. In terms of the convergence function several approaches have been proposed, an extensive overview can be found in [4]. Figure 2 shows an application of such a convergence function. As depicted every node runs free in its pure phase, the dot-dashed line represents the clock value, whereas the dashed lines refer to the accuracy interval. The start-size as well as the growth of this interval is defined by the system’s hard- and software. For example, one major influencing factor to this interval is the quality of the clock’s oscillator. Depending on the class a degradation of the value over time has to be considered. Other major influencing factors like the aging can be found in [5]. The oval framed part in the figure shows the application of the convergence function. As already noted several proposals for this convergence functions have been discussed in literature. The proposals range from simple averaging of all values, which is obviously the best choice to take advantage of statistical distributions of clock readings to Marzullo class operations, where not only the actual value of the clock, but also its own estimation of the accuracy is considered [4]. Figure 2: Principle of the convergence- (often also referred as agreement-) function For the remainder of this paper, a convergence function which is not based on accuracies is used. It is also very well known under the name Fault Tolerant Average (FTA) proposed by Welch and Lydelius [9]. The selection function simply combines the broadcasted clock values in two steps. First the values of all received clocks are sorted. The clocks with highest and the lowest value are sorted out and considered no more. In a second step the remaining values of the clocks are taken to calculate the mean value. This algorithm has the advantage of being simple, however, it suffers from the disadvantage that only two faulty clocks can be successfully sorted out. In case of more failure conditions wrong clock values influence the result of the averaging of the clocks. Finally, another disadvantage of the proposed algorithm has to be mentioned: in order to ensure sufficient fault tolerance at least four democratic members have to be set up in order to have enough clock values to sort out. For the investigations of this paper, namely the failure of a single master node in these disadvantages can be neglected 60

ISPCS 2008 – International IEEE Symposium on Precision Clock Synchronization for Measurement, Control and Communication Ann Arbor, Michigan, September 22–26, 2008 Master Failures in the Precision Time Protocol Georg Gaderer1, Stefano Rinaldi2, and Nikolaus Kerö3 1 Research Unit for Integrated Sensor Systems, Austrian Academy of Sciences 2 University of Brescia 3 Oregano Systems Design and Consulting GmbH Keywords – Fault tolerance, Clock Synchronization, Computer Networks, IEEE Keywords PTP is based on a master/slave principle in a way that once a master, which has been previously elected synchronizes its slaves via multicast messages2. However, for a considerable number of applications even a temporary failure of the clock synchronization is by no means acceptable. The PTP protocol handles recovery from a failure by means of providing the so called best master clock (BMC) algorithm; during this phase all slaves within a synchronization (or multicast) domain remain with free running unsynchronized clocks, yet electing a new master. This paper proposes an approach, where multiple masters are tied together to a so-called mastergroup, where one or more masters may fail without any of the nodes noticing the failure, thus the synchronization accuracy will no be deteriorated. The remainder of this paper is structured as follows: After an analysis of the state of the art, namely the master election process in IEEE1588, the approach to synchronize within the group is elaborated and the proposed system shown within a simulation experiment. Finally a conclusion will round up the paper and give an outlook for future research. INTRODUCTION STATE OF THE ART Clocks1 representing the same notion of time have many advantages in distributed systems, the most obvious being the possibility to set coordinated actions such as synchronized communication. This can be used to establish real-time, which compulsory for TDMA schemes. Another application for synchronized clocks is the identification, ordering, and quantization in terms of timing of events in a distributed system. Again, the applications for this approach are wide spread; one very famous are the LAN eXtensions for Instrumentation (LXI) [1,2], where test and measurement devices are synchronized over Ethernet in order to conveniently setup a-posteriori triggering. Approaches to reach these synchronicity requirements are well observed and take usually advantage of communication networks. Synchronization is done by periodically exchanged messages to align the clock w. r. t. each other. Synchronizing clocks this way is often used, as for example in the internetstandard Network Time Protocol (NTP), or in the more accurate IEEE1588 [2] Precision Time Protocol (PTP) standard. State of the art clock synchronization techniques can use two different paradigms: the master/slaved based principle and the democratic approach. The first method elects one dedicated master in order to synchronize all other nodes. In opposite to that, democratic algorithms use the clocks of several nodes, which are then combined to an agreed clock value. Obviously both approaches have advantages and disadvantages; master/slave based clock synchronization is easy to implement and to debug, the dependencies within an environment running such a protocol are simple. On the other hand, democratic approaches are adding a certain degree of complexity to a system but have the advantage that they can offer fault tolerance. Faulty or malfunctioning clocks can be sorted out without sacrificing even short-term accuracy. Abstract – If all clocks within a distributed system share the same notion of time, the application domain can gain several advantages. Among those is the possibility to implement real-time behavior, accurate time stamping, and event detection. However, with the wide spread application of clock synchronization another topic has to be taken into consideration: the fault tolerance. The well known clock synchronization protocol IEEE1588 (Precision Time Protocol, PTP), is based on a master/slave principle, which has one severe disadvantage. This disadvantage is the fact that the failure of a master automatically requires the re-election of a new master. The start of a master election based on timeout and thus takes a certain time span during which the clocks are not synchronized and thus running freely. Moreover the usage of a new master also requires new delay measurements, which prolong the time of uncertainty as well. This paper analyzes the results of such a master failure and proposes democratic master groups instead of hot-stand-by masters to overcome this problem by. It is shown by means of simulation that the proposed solution will not deteriorate the accuracy of the slave clocks in case of a master failure. A. IEEE 1588 The basics of IEEE1588-2002 and version 2008 are well specified of course in the respective standard document [2]. However, secondary literature is available as well, giving an overview [3]. As this paper focuses on the fault case that a 1 The work presented in this paper is partly funded by the European Fund for Regional Development (EFRE) 978-1-4244-2275-3/08/$25.00 ©2008 IEEE 2 Version 2008 of the IEEE1588 standard (approved at the time of submission of this paper) allows synchronization via unicast messages as well. 59 master’s clock is considered, as wrong, special attention has to be given to the following master election process. hard- and software. For example, one major influencing factor to this interval is the quality of the clock’s oscillator. Depending on the class a degradation of the value over time has to be considered. Other major influencing factors like the aging can be found in [5]. The oval framed part in the figure shows the application of the convergence function. As already noted several proposals for this convergence functions have been discussed in literature. The proposals range from simple averaging of all values, which is obviously the best choice to take advantage of statistical distributions of clock readings to Marzullo class operations, where not only the actual value of the clock, but also its own estimation of the accuracy is considered [4]. Figure 1: State transition diagram of the PTP protocol The standard itself defines for this master election a state collapsed transition diagram rather than a flattened state diagram. A simple election case as shown in figure 1, which is derived from the PTP specification. It is evident from this figure that, in case a master fails, a node formerly in slave mode has to wait at least one synchronization interval to fallback in listening (Li,j) state and then two additional synchronization timeouts until a new master can be elected. The unsynchronized time is therefore with the IEEE1588 default synchronization interval (2s), 6 seconds. This estimation does however not cover the time of the servo to recover from the time without control information. The situation is even worth because the slave node has to wait some additional, randomized time until it can initiate a delay measurement. Until this time is reached only a guess of the line delay can be taken into account. Figure 2: Principle of the convergence- (often also referred as agreement-) function For the remainder of this paper, a convergence function which is not based on accuracies is used. It is also very well known under the name Fault Tolerant Average (FTA) proposed by Welch and Lydelius [9]. The selection function simply combines the broadcasted clock values in two steps. First the values of all received clocks are sorted. The clocks with highest and the lowest value are sorted out and considered no more. In a second step the remaining values of the clocks are taken to calculate the mean value. This algorithm has the advantage of being simple, however, it suffers from the disadvantage that only two faulty clocks can be successfully sorted out. In case of more failure conditions wrong clock values influence the result of the averaging of the clocks. Finally, another disadvantage of the proposed algorithm has to be mentioned: in order to ensure sufficient fault tolerance at least four democratic members have to be set up in order to have enough clock values to sort out. For the investigations of this paper, namely the failure of a single master node in these disadvantages can be neglected B. Democratic algorithms The principle of democratic clock synchronization algorithms is pretty much in contrast to the approach of IEEE1588: In general this type of synchronization approach considers all clocks rather than the value of a single dedicated master. This technique is implemented in a fashion that all democratic synchronizers broadcast their clock values at predefined times. A so called convergence function is then used in sequel in order to find an agreement on the common time. In terms of the convergence function several approaches have been proposed, an extensive overview can be found in [4]. Figure 2 shows an application of such a convergence function. As depicted every node runs free in its pure phase, the dot-dashed line represents the clock value, whereas the dashed lines refer to the accuracy interval. The start-size as well as the growth of this interval is defined by the system’s 60 PROBLEM ANALYSIS AND PROPOSED SOLUTION for large networks. Furthermore, fully democratic networks suffer from a highly increased communication load as every node necessarily needs to broadcast its clock value to a given time to all others. Such a communication requirement is not always easy to meet in networks used for low cost embedded systems. On the other hand, a solution based on more intelligent slaves could be developed. In this case each slave has to be able to synchronize its local clock using time information collected from more than a master. This solution has some problems that reduce its usage. In fact a dedicated protocol, that modifies the PTP stack, has to be implemented in each node that composes the fault tolerant network in order to allow the exchange of the time data among the masters. Generally speaking, this is not feasible for several reasons. Pre-existent PTP nodes in a plant cannot be connected to such networks because they are not able to work in a multi-master environment. Hence, all nodes have to feature dedicated hard-and software, and are thus more expensive, devices. Therefore, a hybrid approach for Ethernet, which was already proposed in previous publications, is used [8]. This approach foresees a group of high performance masters, synchronizing themselves using a democratic scheme. This group appears as one single (ordinary) master if viewed from the perspective of an IEEE1588 slave, as shown in figure 4. The network element which is supposed to keep the agreed time and transmit it by means of multicast synchronization packets to all slaves is the switch, which has to be designed with special care on fault tolerance, by duplicating critical hardware. This is also reasonable because the switch is the single point of failure in an Ethernet network. In contrast to the previous work this approach directly tackles the problem of master failure and is moreover simplified by accuracy-less clock synchronization. Furthermore the dedicated democratic synchronization stack is implemented only in the high performance master group sub-network, whereas the rest of the network can use the standard PTP stack. Thus the fault tolerant network can be achieved by means of standard PTP devices, even already present on the plant, without the needed to develop ad-hoc devices at slave level. The democratic stack is derived from IEEE1588 protocol and keeps as more as possible compatibility with the standard. For instance the time information among the democratic nodes are exchanged by means of synchronization messages which structure is the same of PTP sync messages. For additional information about the democratic stack [12] has to be mentioned. The PTPv1 standard enables clock synchronization with high accuracy limiting both, computational effort of devices and network bandwidth utilization. Speaking in terms of synchronization no additional management of the nodes is required. For this reason it is widely used in low-cost sensor network applications. However, the master/slave hierarchy, implemented by the PTP stack to simplify the data exchange, is the main weakness, as described in the previous section. Figure 3 shows a simulation of a Master/Slave based PTPv1 system, where after a long settling in period (greater 1,5 hours) an artificial error has been injected into the active master, which triggers the master election. In detail, this error was forced by cutting off the communication link of the master, so it stops to transmit timing information to slave nodes. The simulated network is composed of 4 PTP nodes, a master and 3 slaves. The synchronization interval is 2 s. For the simulation parameters typical values for a low cost real world system have been taken, which have proven to produce reasonable results [10]. It can be seen that the maximum error is around 20 µs during the approximately 90 second failureperiod. The relatively high offset (between simulation second 5720 and 5760 (5780) is due to the lack of a delay measurement and a guess for the communication delay. The 2008 version of IEEE 1588 standard provides for a shorter synchronization interval. Thus the maximum time offset among the clocks due to master fault can be reduced but not completely removed, because during the master election thee nodes are still running freely. x 10 Simulated Fault of a IEEE 1588 Master -6 PTP Clock 0 (Master 1) PTP Clock 1 (Master 2) PTP Clock 2 PTP Clock 3 20 Clock Offset (s) 15 10 5 0 -5 5820 5840 5860 5880 5900 5920 5940 5960 5980 6000 Simulation Time (s) Figure 3: Behavior of IEEE1588 v1 slaves in case of master failure The problem with master failures in IEEE 1588 networks is quite obvious. It can be tackled using several approaches. On the one hand, a whole network could be synchronized by a democratic approach. However, this turns out to result in several drawbacks. First of all it is convenient to have dedicated masters which having fault tolerant hardware or at least for synchronization more suited hardware, such as OCXOs, TCXOs or even rubidium or GPS sourced clocks. To equip all nodes with expensive hardware is not justifiable 61 The test of the master group approach described in the previous section has been performed by means of a simulation framework for synchronization already presented and evaluated in [8]. In this paper the framework has been adapted and configured in order to test the behavior of synchronization, (PTPv1, democratic and master group) in case of a single master fault. B. Simulation Setup for Master Fault Evaluation Figure 5 shows the network used to test the algorithm. The network itself is composed by two different subnetworks: the first implements a democratic algorithm for synchronization, the second a master-slave approach (IEEE 1588). The sub-networks are in this setup interconnected by means of a switch. This solution allows to perform several tests by simply changing the configuration parameters of the same network topology. For instance, it is possible to test the synchronization accuracy of PTP or democratic algorithms concurrently on the same physical network using different multicast groups or to test the master group approach by joining the two subnetworks. The latter is done by so called master group speaker, which is the aforementioned a fault tolerant switch. During implementation of a framework for testing synchronization techniques great attention has to be paid to the development of oscillator model. The parameters assigned to this model are typical values of a low cost oscillator for sensor networks [10]: • Oscillator Frequency: 10 MHz • Oscillator class: 100 ppm • Aging Factor: 1 10-10 1/day The reliability of simulation results can be improved by a more accurate oscillator model which could considers also the stochastic phenomena, like that described in [6]. The experiments presented in this paper are used to test the behavior of the master group approach during a group member fault condition. In this paper in particular the problem of permanent master failure is handled. For this reason the network is composed by 6 nodes for the democratic and 4 for the PTP sub-network. The number of nodes for the democratic backbone is implicitly defined by the FTA algorithm. As this algorithm removes two nodes, at least three have to remain in order to obtain useful results. This way the synchronization accuracy obtained by the two techniques can be directly compared. Figure 4: Master Group Approach A. Simulation Environment The analysis of the behavior of the synchronization algorithm during fault conditions should be made in a controlled and repeatable environment, which is perfectly met when using a simulator. For several reason it is not easy to make all feasibility tests in a real network. First of all it is not possible to stop a full fledged user network for such experiments, because it has to be avoided to stop the production just for an experiment. This makes large-scale experiments almost impossible. On the other hand it is too expensive and it would take too much time implementing a dedicated experimental network with hundred of nodes linked among each other with a special topology. Moreover, it is difficult in such a network to create the fault condition in a repeatable way, in order to record the timing data. Great part of this problem could be solved using a simulation environment. In fact the user can easily modify the number of nodes of the network under test, its topology, the complexity of model and adapt the simulation time to its needs. Also it is easy and often helpful to reduce the simulation speed, when it is important to catch the details or increasing in order to reduce the computational effort. A discrete event simulator, OMNeT++ [7] (available under license free for academic use under the Academic Public License, APL) has been chosen because it is a open source environment and it is easy to adapt the model to describe several devices. This simulation tool allows development of network device models using C++ code. The interconnections among the sub-models in OMNeT++ is defined by means of a topology description language, the NED language. The developer can decide on the complexity level of each model in order to fit the requirements needed by the application without overloading the computational capability of the simulation host. For instance, it is possible to create a model for both, the hard- and software part of the system in case a great accuracy is needed or to simplify otherwise. This feature is very useful during the implementation of a synchronization simulation framework, where it is very important to select the right level of detail of each system part. 62 DemClk_0 DemClk_1 DemClk_2 DemClk_3 DemClk_4 DemClk_5 Distribution of the Clock offset around the Failure Event (PTP Clock 1) Fault Tolerant Switch PTPClk_0 PTPClk_1 PTPClk_2 Probability 0.8 PTPClk_3 Democratic subnetwork 0.6 0.4 0.2 PTP subnetwork Figure 5: Simulation setup for master fault evaluation 0 -6 -4 5000 -2 5200 SIMULATION RESULTS 0 5400 5800 The simulation environment addressed in the previous section is subsequently used to setup the following case: A master group is used to synchronize a group of PTP slaves. By means of error injection one of the masters is set to a faulty state. The assumed fault model for this changes the jitter of the node and clock-rate as well as the ability to control increment. Figure 6 shows the clock values of this master group member before and after the failure. For illustration purposes the jitter is shown via a sliding window function which is applied piecewise over 100 two secondsamples of the offset to absolute ideal simulation timescale. The dotted (red) line in this picture denotes the time window in which the failure condition is triggered. It can be clearly seen how the single clock slides away from the rest of the synchronization domain. The clock offset is aligned to the ideal master group agreement time, i.e., the virtual master time for all slaves. 4 Clock Offset (s) Window Position (s) Figure 7: Behavior of the IEEE1588 slave during a master group member failure The time offset of each node from the node 0 in subnetwork (democratic master group and IEEE1588 slaves) is shown in table 1. The results are reported before and after the failure of the democratic clock #5. It can be clearly seen how the fault of a member of master group doesn’t change significantly the synchronization accuracy of the IEEE 1588 slaves. On the other hand, the failure of a democratic member has some effect on the master group synchronization accuracy, as shown by the slight increase of standard deviation. This way the number of nodes in the network decreases which reduces the filtered noise by the democratic algorithm. Network Clocks DemClk 0 DemClk 1 DemClk 2 DemClk 3 DemClk 4 DemClk 5 PTPClk_0 PTPClk_1 PTPClk_2 PTPClk_3 Distribution of the Clock offset around the Failure Event (Faulty Democratic Clock 2) 1 0.8 Probability x 10 2 5600 0.6 0.4 Offset (Before) [ns] Mean Std. Dev. 0 0 0 34 2 35 0 34 -1 34 6 35 0 0 -6 45 -4 43 -3 43 Offset (After) [ns] Mean Std. dev. 0 0 3 35 6 37 1 36 5 35 0 0 -7 42 0 44 -9 44 0.2 Table 1: Synchronization offset during a master group member failure 0 The democratic algorithm provides more synchronization accuracy then IEEE1588 protocol, as shown in the table (the standard deviation is 10 ns lower). As discussed in the previous section, a more accurate oscillator model, that takes also non-deterministic noise, has to be developed in order to obtain more reliable results about the synchronization accuracy from the simulator. 5000 5200 5400 5600 5800 Window Position (s) 0 -0.5 -1 -1.5 -2.5 -2 x 10 -5 Clock Offset (s) Figure 6: Failure and drift of a Master Group Member Far more interesting than this obvious result is the influence of clock failures on a simple PTP slave. This condition is shown in figure 7 where the identical setup has been used. This qualitative analysis shows that the slave is obviously not at all influenced by a single master failure. 63 -6 REFERENCE ES [1] [2] [3] [4] [5] [6] [7] [8] Figure 8: Behavior of master group member d during power up The simulator is also useful to analyzee the behavior of synchronization algorithms in particular situuations, which are difficult to cover in real-life networks. An eexample for such an scenario is in the terms of measurementt difficult task to cover the power-up phase of a widely distribbuted network. In this case the simulation can assist to innvestigate and to improve synchronization performance. For iinstance, figure 8 shows that the democratic algorithm of masster group nodes, described in the previous section, which has a very long convergence time at start-up. This can be a pproblem for some applications and thus, more investigations arre needed. [9] [10] [11] [12] [13] CONCLUSION AND OUTLO OOK Especially in fault sensitive environmentss the loss of clock synchronization can have fatal consequencees. In the case of master/slave based clock synchronizationn as defined in IEEE1588 it leads to a master re-election, dduring which the slaves are not synchronized. It is shown bassed on simulation results, that the offset during this time can rreach excessively high values. This problem can be tackled byy introducing the concept of master groups, a cluster of IE EEE1588 enabled masters, which solve this problem by synnchronizing as a group via a democratic algorithm. In this caase the fault of a master can be handled transparently, withouut any observable effect at the slave-side. mparison between Further investigations will cover a com several synchronization algorithms such ass those based on Marzullo functions as well as an analysiis of the startup behavior. Moreover, it has to be investtigated how the combination of several masters to a group can improve the quality of the reference which is transferred tto the slaves. 64 J. C. Eidson, The Application n of IEEE 1588 to Test & Measurement Systems, LXI Whiteepaper 2005 LXI Standard. LXI Consortium. www. w lxi-consortium.org IEEE1588 D2.2 Active Approved d Draft Standard for a Precision Clock Synchronization Protocol for f Networked Measurement and Control Systems, IEEE, Mar 2008 8 John C. Eidson, Measurement, Control, and Communication Using IEEE 1588 (Advances in n Industrial Control) by April 2006, Kindle Book Schmid, U. (ed) Special Issue on the t Challenge of Global Time in Large Scale in Distributed Real-T Time Systems, 1997 Georg Gaderer, Patrick Loschm midt, Aneta Nagy, et. Al., “An Oscillator Model for High Preccision Synchronization Protocol Descrete Event Simulation”, Pap per in Print, 2007 International Conference on Precision Time an nd Time Intervals, Long Beach, 2007 OMNeT. OMNet++ Discretee Event Simulation System. www.omnetpp.org Praus F., Granzer W., Gaderer G., Sauter T. “A Simulation Framework for Fault-Tolerantt Clock Synchronization in Industrial Automation Networks””. In Proceedings of the IEEE Conference on Emerging Technologies & Factory Automation (ETFA07), pp. 1465-1472, Patras,, Greece, Sept. 2007 Jennifer Welch-Lydelius and Nan ncy Lynch A new fault-tolerant algorithm for clock synchronizattion, Inf. Comput. Vol 1, 1997 pp.1-36 Gaderer, G., Loschmidt, P., and Sauter, T. “Quality Monitoring in Clock Synchronized Distributeed Systems”. In Proceedings of the IEEE International Worksho op on Factory Communication Systems (WFCS06), pp. 13-21, To orino, Italy, June 2006 Gaderer, G., Sauter, T., and Hö öller, R. “Strategies for Clock Synchronization in Powerline Neetworks”. In Proceedings of the 3rd International Workshop on Reeal Time Networks (RTN04), pp. 53-56, Catania, Italy, June 2004. Gaderer, G., Holler, R., Sauter, T.. and Muhr, H. “Extending IEEE 1588 to Fault Tolerant Clock Synchronization”. In Proceedings of the IEEE International Worksho op on Factory Communication Systems (WFCS04), pp. 353 – 357 7, Austria, September 2004. Gaderer, G., Loschmidt, P. and Saauter, T. “IEEE 1588 Real-Time Networks with Hybrid Masteer Group Enhancements”. In Proceedings of the 4th Internattional Workshop on Real Time Networks (RTN05), pp. 25-29, Spaain, July 2005

Log In

Master failures in the Precision Time Protocol