TR WinCC OA Redundant Systems en
TR WinCC OA Redundant Systems en
TR WinCC OA Redundant Systems en
Training module
WinCC OA
Redundant Systems
Slides
Exercises
Comments
Content
3 General
9 WinCC OA Redu-switching in detail
16 Redundant LAN-connections
22 Message Peripherals → WinCC OA
30 Exercise 1: Create redundant WinCC OA project
43 Exercise 2: Setting the error weighting
53 Exercise 3: Changing the config.redu
55 Exercise 4: Split mode
62 Exercise 5: Redundancy switching
67 Exercise 6: File synchronization between red. Servers
74 Redundancy and value changes from CTRL/API Manager
Note:
Availability
A
in theory, two parallel processes of the same kind (A) will
than A have a calculated availability of A v A=1-(1-0,9).(1-0,9) =
0,99
A
But the real life shows, that the mechanism of redundancy on its own will introduce an
additional source of failures. Therefore the above example may reach not 0,99
availability but maybe only from 0,96 to 0,98.
Attention:
The above mentioned numbers for the system availability is only meant as an example
and does not reflect the availability of a real WinCC OA system (the real availability
numbers may vary in a wide range, depending on the quality of the code, a user adds
to his project (panels and scripts) and of course also the quality of some user created
drivers or managers using the WinCC OA API interface).
Note:
For a redundant WinCC OA system at least two hosts with identical operating system
and identical WinCC OA Version installed to it are needed.
In addition to that the project-path has to be the same at both servers (make sure that
there is the same number of hard disk drives installed to both servers).
Note:
Client-Server - concept
The Event-manager (EV) and the Data-manager (DM) are acting like a server; the
other managers are acting like clients. The connections between them is done
throughout TCP/IP.
The communication amongst the manager is done by the use of proprietary
“messages”. The structure of this messages is defined by the API, the handshake
mechanism is defined there as well.
The different managers (EV, D, DM) will care about “their” configs (i.e. the
parameterization of the periphery addresses and also some calculations in respect to
these values are directly carried out by the certain driver and not by the Event
manager).
The EV has complete knowledge of all process states (i.e. all last recent values with
their actual timestamp and statusbits, from every process variable).
When some data is queried by any other of the connected manager, the EV will
answer this messages if the query regards actual data, when historical data is queried,
the message will be forwarded to the DM and he will return the data via the EV.
Note:
Note:
The basic concepts of redundancy in WinCC OA:
Redundancy is a manager internal feature, that all WinCC OA managers are
supporting inherently.
When using WinCC OA as a redundant system, than all the required managers that
are part of the project will run on two separate, but connected systems. The remote
UIs are in most cases installed on non redundant workstations (but for normal there
are several equal UIs at a control center) and they are all connect to the central
redundant server pair.
The UI holds connection to both EV-managers, but only the active one takes the
messages sent from the UI to update its process view and will also forward this value
changes to its passive partner.
The passive EV-manager on his turn also updates its own process view, therefore the
two separate process views will every time stay in sync.
The dynamical switching from active to passive and the other way round on the other
system may therefore take place very fast and is also carried out by the EV-manager.
For the initiation and synchronization of a redu-switch the EV-manager can rely on a
specialized Redu-manager, a special control script and several internal data points.
The connected UIs will follow automatically to the current active server and therefore it
is ensured that they will show the correct and actual process view at all the time.
(From the view of an UI, The redundancy switch will take place completely in the
background and therefore it will be seaming less with no interruption to the operability
of the UI.)
Redundancy state:
The redundancy state will be monitored and controlled by the Redu-manager,
because they are monitoring all the vital information (like error-states, redu-states
or connection-alive-states) not only from its own system but also from the redu-
partners system. Those states are all together used to calculate a system-error-
state, that are then compared to each other. The system with the better (lower)
error state will become the active system at any time. These calculations are
carried out by a control script (calculateState.ctl)
Error states:
For the calculation of the error-states, the states of all the existing WinCC OA
managers and their current connection as well is considered. Also values from
some data points or the current memory consumption may be monitored to
contribute to the error states.
Because all this information is stored on data points in the internal WinCC OA
database, an individual change to the parameters of this calculation is easily done.
All these changes may be incorporated by using the system overview panel with
no need of changing some internal scripts or doing some advanced programming,
etc.
Note:
The UI is connected to both systems (EV), but will only accept messages with updates
to the process view, from the currently active system (in our example here, the left
system is the currently active one). Because the UI is connected to both systems;
commands will also be sent to both of the systems EV-manager, but nevertheless the
passive EV will drop this message and therefore ignore it
The Redu-managers will establish a direct connection between each other, witch is
used in case of a redundancy switch for synchronization reasons.
Both calculateState scripts will calculate the current error state for each system (the
details were mentioned on the slides before).
The results are stored on data point elements of the internal data point named
_ReduManager (or _ReduManager_2 for the second system)…
… and by the means of a fwdDP command (placed in the config.redu-file) the
calculated values from the own system will be forwarded to the other systems
_ReduManager (or _ReduManager_2) data points.
Note:
Each Redu manager now knows about the error states of both systems and under
consideration of some additional rules they may derive now the actual redundancy
state.
In the case that the redu managers have to invoke a redundancy switch of their
systems, they will exchange this information about the planned redu-switch to each
other ( for double checking reason).
Note:
If both redu manager came to the same conclusion, that there is a redundancy
switchover necessary (this should be the fact in the very most cases), the local EV-
manager will be informed by its redu manager to carry out the redu-switchover.
Both EV-manager are setting their_ReduManager- data point to the appropriate
values, that will be for instance the current and upcoming redu state. This change of
data will now trigger the redu-switching..
Note:
Both EV-managers send out a system message that informs the other EV-manager
(and also any other manager in the system) that its state is going to change now from
active to passive or vice versa. This message has to be acknowledged by the other
EV-manager per handshake-mechanism …
… after that the now changed redu state is written to the internal _ReduManager data
points…
… at this stage the redu-switching is finished.
Note:
The UI is also receiving the system message from the so far active EV-manager and
switches automatically to the other, from now on active system.
The whole switching procedure will take place unnoticed from the user, completely
carried out in the background (of course by putting on the system overview panel the
actual active or passive states and also the redundancy switching may be observed by
the user).
Hint:
When using a redundant LAN connection the IP-forwarding has to be disabled at the
operating system level. Otherwise the OS would try to reroute the messages from the
broken connection to the still working connection, which will double the network traffic
over this connection and therefore maybe lead to collisions and packet losses.
Also the detection of the network failure would be a little bit tricky, because in the case
the OS found an alternate route the “broken” network connection is also shown as up
and running.
Hint:
In order to achieve a better fault tolerance WinCC OA managers may also be connect
to other WinCC OA managers running on different computers via redundant network
connections (this means a connection established by using two physically separated
network connections, i.e. two network cards at each machine).
If you configure redundant network connections in a (distributed) system, the
messages are sent using two physically different connections. When the system
receives a message, it accepts the message immediately. If the system then receives
the identical message through the second connection, it discards the message. (i.e.
the system will not delay the execution of the first incoming message only to wait for
the receiving of the message through the second network connection).
This can be done because every message in WinCC OA will have a unique,
constantly rising message id. Messages with lower ids than the last recent one
received can be discarded safely.
Is one of the networks failing there is no need for doing a network switch or something
like that, because the messages keep coming to the other system at least once.
Hint:
For local host WinCC OA Manager communication in a redundant LAN network only
the local interface will be used which is not shared to the second host.
WinCC OA manager communication between the two hosts will use the NIC 1 and NIC
2 interface.
Note:
The redundancy is working independent and is not relying on any user interactions or
reactions. Nevertheless some commands from the user may be accepted under
certain circumstances:
Definition of a “preferred active” state for one host (this setting is the one with
the lowest priority and only taken into consideration if there is the same error
state on both systems).
Immediate manually triggered switching in case of emergency (“forced active”
– also carried out when the calculated error state would result in another redu
state – that means that the automatic redu-switch because of the error state
calculation is disabled à also called manual mode)
When there was a forced active setting, the user may decide to go back to
normal operation (leaving manual mode à going back to automatic mode
Note:
When a redu switch was carried out in a WinCC OA system there will, shortly
afterwards, be issued a general query for all drivers (just to play save and retrieve the
last recent process states from all existing variables).
Note:
The above diagram shows the message logic for telegrams from the periphery to
WinCC OA.
1) The Process data is changing and transmitted to both drivers (the PLCs do not
know whether a WinCC OA server is active or passive) .
2) Both driver processes the new value and will then forward it to its EV-manager; the
active EV will accept the message, the passive one will block the message from its
driver.
Note:
3) The active EV received the new data point value and will now forward it to its DM, to
all manager, that are subscribed to this value (via dpConnect) and to the passive EV.
4) The passive EV will then also forward the message to its (passive) DM.
Note:
The above diagram shows the message logic for telegrams from a WinCC OA UI to
the periphery.
The passive EV will not accept messages from any other manager than his active
partner.
Note:
3) The active EV has updated its process states and will send this information to all
other manager, that are subscribed to that value. At least it will be sent to its DM, the
active driver and the passive EV as well.
4) The active driver immediately sends this new values to the peripherals.
5) The passive EV will then also forward the message to its (passive) DM.
Note:
When a new configuration of the project should be tested, this can be done by
temporarily disconnect the two redundant servers and bring them in a so called split
mode.
Now the splitted system may be used for testing purpose, although it will not send out
any commands to the periphery, the split system will still retrieve all the actual
changing of process data.
Of course the active system will keep up the normal operation all the time, any other
connected UI workstation may notice nothing about the split mode.
Note:
During split mode both systems are running like they were stand alone systems. The
Redu-managers therefore have nothing to do any more and are sent to stand-by
mode. The task of the Split-managers will be to keep track of those values, that should
be exchanged on demand or continuously between the two systems.
Note:
By using the WinCC OA project administration a new redundant project is created very
easily, but nevertheless when an already existing project should be adapted to become
a redundant system, the necessary config entries may be added to the config-file
manually and also the additional redu manager may be added to the progs-file via the
WinCC OA console.
Note:
When having a redundant LAN each server will have two NICs installed. Because
each server now will also have two TCP connections for every WinCC OA manager,
and in WinCC OA the reverse DNS lookup mechanism is used, we have to make sure
that the two different network connections can be identified by a unique name. This is
done by introducing a -1 and -2 postfix to the servers hostnames. You have to ensure
that your DNS knows about these additional (new) server names or add it manually to
the hosts file (on windows machines this file may be found at
"C:/Winnt/system32/drivers/etc/hosts" - on linux "/etc/hosts").
Attention:
Hostnames that contain a -1 or -2 are not allowed at a redundant WinCC OA system!
Hint:
All necessary entries to the config file will be done automatically by the setup wizard. If
you want to add this manually at an already existing project this may be done in the
following way:
data = "<host_#1>$<host_#2>"
event = "<host_#1>$<host_#2>"
When not using the default port settings the ports have to be added to the hostnames
as well.
All managers that usually only connects to one of the redundant systems (i.e. Control
managers) may also connect to both if this behavior is needed by using the following
config entry at the manager-specific part of the config-file:
connectToRedundantHosts = 1 (Default is 0)
Hint:
All further settings (timeout at redu-switch, etc.) are applied to a special config file, the
config.redu. A preconfigured file with useful settings can be found at <WinCC
OAPath>/config.
Note:
There are two options to bring the project to the second server:
1.)
Using the wizard from the WinCC OA project administration.
Don't forget that in this case, the project folder on the first server has to be set up as a
shared folder, otherwise the second redu-partner will not be able to copy the project
files!
2.)
Manually copy the project to the second machine using a pen drive.
Place it on the exact same location (same folder, same hard drive) and register it with
the WinCC OA project administration (register project button).
In this case you dont need to configure a network share.
Hint:
When copying the project to the second redu-partner the target path cannot be chosen
freely, but the same path as on the first host will be used. Make sure that the drive-
letter used at the first redu-partner is also available on the second system.
Please note, that the <projectSource> path must be written as a UNC path. e.g.
\\Servername\Sharename\Projectname. A mapped path will be handled as a local path
which is no allowed.
Hint:
Make sure, that the source project is not running, because the whole project data will
be copied to the other machine (this will include the raima-db).
Note:
The necessary manager for redu operation (Redu und Split), as well as a script that
calculates the error state of a system in the background (calculateState.ctl) are already
added to the console.
Hint:
The order of managers starting up shall not be changed, at least from the DM-
manager until the Ctrl-manager. Project specific managers should be added to the
console after the Ctrl-manager. A “Local" UI (directly started on a server) will get a
manager number starting at 7 (this is only a default and may be changed in the config
file). Remote UIs should in any case use fixed manager numbers (the –num option)
ranging from 1 to 6.
Hint:
Before starting up the two projects, please make sure that the two servers have
synchronous time settings.
The usage of a time synchronization software is highly recommended.
Note:
When starting up a redundant project the user have to start both systems, one after
the other. The second host that comes up will then be automatically aligned to the
actual process data of the first one started.
When both systems are up and running in sync the calculated error state will decide
which system then will become the „active“ or the „passive“ one. When both servers
are running with the same error state (this should be the most usual case), than the
first one started will keep active and the second one will join in as the passive (hot-
standby) machine.
Note:
At the system overview panel the detailed states of both redundant partner (host_#1
and host_#2) can be monitored. At the above screenshot the left system is the active
one (to be recognized because of the green EV-manager symbol)
In the upper Area of the panel the currently connected UIs will be displayed. In the
tables below the EV symbol the local connected managers are listed (grouped by Ctrl,
Drivers, Archives and maybe Dist).
Directly aside the drawings of the two hosts the actual error state of each system can
be monitored (the error state is shown in the following way: “current error state” /
“maximum error state possible” i.e. the sum of all parameterized weightings). The table
directly below lists up all current errors in detail (weight and description of error). The
weighting of the monitored errors may be parameterized by right clicking on that table
or using the above button with the three dots on it. There are also additional buttons
implemented to change the preferred redu system or set a distinct system to forced
active mode. Also a button can be found for sending the redundant system into the
split mode and another one to come back to redundancy again.
Note:
The parameterization panel for the error weighting setting may also be opened by right
clicking on of the shown tables (error state, archive, driver, Ctrl, etc.).
Note:
A loss of connection to the driver with the number one, is now contributing 40 points to
the error state of the respective system.
Note:
The Error state text field shows the following:
„sum of all current error-weights / sum of all possible error-weights"
Notice that the numbers have changed from "1/220" to "1/240" .
Note:
"ExampleDP_AlertHdl1" from dat point type "ExampleDP_Bit"
Note:
Whenever the value of this data point will be equal to TRUE, then this will contribute a
weighting of 100 to the calculation of the current error state.
Note:
Notice that the numbers for „sum of all current error-weights / sum of all possible error-
weights" have now changed from "1/240" to "1/340" .
Note:
The error state has now changed from " 1/340 " to " 101/340 " and the table below,
that shows the detailed information about the actual error state, is also updated.
Because the value was set with the PARA module, the value has changed on the
active system but of course also on the passive one. That is why there is no difference
in the error state between the two systems and there is also no redu-switch.
Note:
The config.redu-file should not be copied from the WinCC OA installation folder, but a
new empty file should be created inside project config folder.
When a value of a specific config entry shall be changed for that particular project,
then you should copy the desired entry from the config.redu of the installation folder.
(take care about the correct section, most likely it will be [event] or [data])
Attention:
You have to make sure that in the config.redu file only redundancy related entries are
used, because the config.redu file has higher priority than the projects normal config
file.
And of course that is also the reason why redundancy related entries are not allowed
to be written in the normal config file of a project.
Note:
If more than three drivers are used in a project, there is the need of creating internal
datapoints at the DPT _DriverCommon.
For example _Driver4 and _Driver4_2 needs to be created, that a driver with the
number 4 will be working correctly in a redundant project.
You do not need to do any changes for that additional DPs in the config.redu file any
more, because those DPs will be taken care of by copyDpType and fwdDpType.
Note:
Keep in mind to copy the config.redu-file also to the other server and then both
projects have to be restarted that the changing may take effect (config file entries
will only be read at the startup of the managers).
Note:
copyDp/copyDpType overrules fwDp/fwDpType.
Note:
The system that was active when going into split mode will keep the active driver
connection to the peripherals, the passive system will become the testing system.
By clicking one of the Active Driver checkbox all of the driver connections will be
switched from one to the other system. An active driver is capable of sending and
receiving messages from the peripheral units, a passive driver only receives
messages, but sends out nothing.
With the help of the split manager the systems are exchanging internal data
(connection state, memory usage, etc.). Should there be the need of exchanging
additional data during split mode, the predefined datapoint-groups SplitGet,
SplitGet_2, SplitConnect and SplitConnect_2 may be adjusted accordingly.
As long as the system remains in split mode (split-manager is active), the redu-
manager is sent to stand-by- mode.
Note:
At one of the two hosts a new UI is started in PARA mode, that will only connect to the
testing system. This is done by stating the data- and the event host name to which it
shall connect as an option for the startup command of this UI:
-m para -extend –data <hostnameTestsys> -event <hostnameTestsys>
Note:
The creation of the data point named split from type ExampleDP_Float was done at a
PARA module, that was only connected to the testing system.
Therefore it is clear that this data point only exists at the database of the test system
but not on the other system.
For normal a UI is not connected to only one of the two redundant hosts but to both,
therefore these “normal” UIs would see both systems data rooms.
By using the data <hostname> and event <hostname> option (either in the config-file
or as a startup option) a UI will only connect to the system with the given hostname
(the active or the testing system) and therefore this UI will be only aware of the data
existing at that systems local database.
Note:
By selecting which of the two system shall remain, it is also selected which set of data
should be taken back into normal operation mode. The other system is restarted
because at the restart of a redundant system an automatic data alignment to the
already running system is done.
Note:
When the network connection between the two servers is lost both systems will
become active because they are not getting alive packets from the partner anymore
and therefore the other system will be considered lost.
When the network connection is up again, the system with the higher error state or the
system that was passive before the connection failure, is being restarted, to ensure
data consistency.
Note:
In the case of equal error states the active system may be selected, by using one of
the preferred buttons.
Note:
In the case of unequal error states the active system may be set to the right or to the
left system by using one of the force active buttons.
Note:
In a redundant system all raima-db data (PARA) is automatically synchronized
between the red. Server pair. All configuration data files(Gedi) needs to be
synchronized manually with the CTRL-Script „filesync.ctl“.
CAUTION:
Use a unique manager number for the Control Managers on both systems. Note that
the same number may not be used on both redu systems.
Note:
If the redundancy system is switched during the file synchronization, the file
synchronization process is stopped.
(messages are shown in the log viewer). The file synchronization is not automatically
restarted or continued.
Note:
Now, when the system overview is opened, the FileSync button is present.
Note:
The settings and directories for the file synchronization can be configured in the file
synchronization panel. By default the table contains all directories, which are
recommended for the file synchronization.
It’s possible to add more folders from within the project directory. The “data” directory
and the “backup” directory may not be synchronized, because these folders always
contains the backup files.
!!! It’s only possible to synchronize from the active red. Server to the passive one. !!!
Example given:
Redundant System with Control Manager running on both redundant hosts, which
calculates the sum out of two values DP1 and DP2 and writes the result on a third
datapoint DP3.
User triggers a value change s on DP1, which triggers calculation of the sum value
within the CTRL managers on both hosts
Let’s assume the following situation
The calculation on the passive host is finished a bit faster and sends the result
to its Event Manager via dpSet, which discards the value change.
During still running calculation on the active host, a redundancy switch over
occurs, which means that this host is now the passive one.
When the calculation is finished the CTRL manager sends the result value to its
Event, which is now the passive one and also discards the value change
In the end, the calculated summary value is lost.
Important note:
This avoids loss of value changes but can lead to duplicated values after the
redundancy switch, because the dejavue queue is processed again after redundancy
switch over.