Distributed Simulation of Electric Power Systems

14th PSCC, Sevilla, 24-28 June 2002 Session 33, Paper 2, Page 1
DISTRIBUTED SIMULATION OF ELECTRIC POWER SYSTEMS
J. Jatskevich1, O. Wasynczuk1, N. Mohd Noor1, E. A. Walters2, C. E. Lucas1, and P. T. Lamm3

1
Purdue University, West Lafayette, IN, USA
2
PC Krause and Associates, Inc., West Lafayette, IN, USA
3
Air Force Research Laboratory, Wright Patterson Air Force Base, Dayton, OH, USA.
jatskevi@ecn.purdue.edu
wasynczu@ecn.purdue.edu
walters@pcka.com
celucas@ecn.purdue.edu
peter.lamm@wpafb.af.mil
Abstract – Recent advancements in computer network- the computing time a critical issue in many practical
ing have enabled the interconnection of inexpensive desk- cases. Distributing the computational burden onto sev-
top computers to form powerful computational clusters eral computers appears to be a natural way of improving
that can be effectively utilized when simulating electric the simulation speed.
power systems. In order to maximize the computational
gain, it is necessary to identify the computational tasks that Since the modeling of power systems using a state-
can be performed concurrently and to optimally distribute variable approach involves solving an initial value prob-
those tasks among the available processors. In this paper, lem (IVP) with the corresponding DAEs or ODEs, the
the electrical power system is viewed as a collection of numerical integration is the process that determines the
interconnected dynamical subsystems each described by a overall simulation speed. Examples of recently devel-
set of differential/algebraic equations. The tasks that can be oped techniques for performing numerical integration on
performed concurrently are identified and a new approach parallel computers include parallel implementation of
of optimally distributing the corresponding calculations is waveform relaxation [5], parallel one-step methods [6]-
set forth. The effectiveness of the proposed approach is
[7], parallel multi-step methods [8], and others [9].
demonstrated by distributing a detailed simulation of the
Western System Coordinating Council (WSCC) three-
Unfortunately, none of the simulation languages [1]-[4]
machine nine-bus system on an SCI-based network com- support parallel ODE solvers. Implementation of any of
posed of three personal computers. The simulation includes the previously mentioned methods into an existing simu-
the effects of network and stator transients. Using the lation program may require a significant amount of
Runge-Kutta-Fehlberg integration algorithm, a 206% effort and may not always be feasible. An alternative
improvement in simulation speed was achieved. method of distributed simulation proposed in [10]
requires fixed-time-step integration and communication
Keywords – power systems, distributed simulation, between the component models at fixed intervals. Both
parallel computing fixed-step ODE solvers and fixed communication inter-
vals may not be appropriate when modeling power sys-
1 INTRODUCTION tem with switching and/or inter-component stiffness.
Computer simulations developed using ATP/EMTP On the other hand, the speed of sequential ODE solv-
[1], ACSL [2], Matlab/Simulink [3], EASY 5 [4], etc., ers available within a given simulation program may also
are commonly used to predict the transient behavior of be improved using parallel computers. In order to see
power systems and energy conversion devices. In each how this improvement can be realized, it is necessary to
case, the underlying systems of differential and algebraic note that most sequential solvers spend a significant por-
equations (DAEs) or ordinary differential equations tion of the overall CPU time evaluating the derivatives of
(ODEs), are solved numerically in the time domain state variables. In this paper, an approach herein referred
using available integration algorithms and/or solvers. For to as the distributed evaluation of state derivatives
example, in ATP/EMTP, the trapezoidal integration is (DESD) technique is considered. Utilizing this tech-
used. Other packages offer a library of solvers that may nique, the integration of the overall system is performed
be suitable for different problems. In general, the com- on one computer (master) using a single integration
putational complexity is determined by the ODE solver algorithm; whereas, the state derivatives are computed in
used, which increases as a polynomial function (typi- parallel on other computers (servers). DESD can be
cally cubic) of the system to be simulated (problem implemented as shown in Fig. 1. In addition to evaluat-
size). Thus, as the system to be simulated grows in size, ing the derivatives, sequential ODE solvers perform
so does the time required for computing the solution of other serial calculations (solving systems of equations,
the corresponding equations. Even though the speed of finding coefficients, etc.). Due to these solver-dependent
modern computers continuous to increase at a rapid rate, serial calculations, the maximum theoretical speed-up of
the need to model more complex electrical systems with DESD on an idealized computer network is bounded by
as many details taken into account as possible still makes Amdahl’s law [11]. Additionally, in an actual implemen-
tion time. If T ( 1 ) > T ( n ) , it is appropriate to use the
node 1 node 2 node m-1 speed-up factor
server server server
S( n) = T ( 1 -)
---------- (2)
x dx x dx x dx T (n )
1 2 m-1
In a state-variable-based simulation, the overall CPU

time taken by a general-purpose sequential solver can be
node m
ODE Solver expressed
master
T sim = N ( C solver + k solver T der ) (3)
Figure 1: Distributed evaluation of state derivatives.
where N is the total number of integration steps, C solver
tation, the communication latencies can be significant is the time spent by the solver performing internal serial
and the theoretical speed-up is never achieved. calculations (solving equations, updating coefficients,
In this paper, a simulation of the electromechanical computing solution for the next step, etc.), T der is the
transients of the Western System Coordinating Council
time to evaluate the derivative function, and k solver is the
(WSCC) three-machine nine-bus system has been imple-
mented on a Scalable Coherent Interface (SCI)-based number of derivative evaluations per step. The constants
network of three personal computers. This simulation C solver and k solver depend on the order and type of ODE
includes the stator and network electrical transient (using
solver; however, the time T der does not. Moreover, the
lumped parameter transmission line models), the rotor
winding electrical transients, machine excitation sys- term k solver T der typically dominates (3). This observa-
tems, and the lumped-inertia mechanical dynamics. A tion suggests partitioning the overall system of ODEs as
procedure for subdividing the overall model and assign-
ing the subsystems to the network computers is set forth. x1 f 1 ( x 1, x 2, …, x m, t )
It is shown that by dividing the overall model into sev-
dx = d x 2 = f 2 ( x 1, x 2, …, x m, t ) (4)
eral subsystems and distributing them among the com-
dt dt … …
puter nodes, the time required for evaluating the state
derivatives can be reduced by a factor of 2.407. Com- xm f m ( x 1, x 2, …, x m , t )
puter studies are performed using two integration algo-
which can be implemented as shown in Fig. 1. In DESD,
rithms: variable-order Gear’s and second-order Runge-
a single integration algorithm is applied to the entire sys-
Kutta-Fehlberg. Due to differences in the serial calcula-
tem as in a conventional single-computer simulation.
tions performed by each solver, the overall simulation
However, calculations that can be performed concur-
speed was increased by 1.739 and 2.064 times, respec-
rently are identified and implemented on separate com-
tively.
puters (servers). Each server evaluates a subset of the
state derivative vector and sends the results to the master,
2 DISTRIBUTED SIMULATION which updates all state variables in accordance with the
selected integration algorithm. The advantage of this
In a network of computers, an improvement in com-
approach is that it can be implemented using any stan-
putational performance can be achieved through paral-
dard ODE solver available in programs [1]-[4]. Based on
lelism. For the purposes of discussion, it is assumed that
(3), the overall improvement in simulation speed may be
a complex problem can be divided into m independent expressed as
tasks. The total CPU time required to solve these tasks is
T der ( 1 ) – T der ( n )
denoted by T ( n ) , where n represents the number of S ( n ) = 1 + --------------------------------------------------------------
- (5)
( C solver ⁄ k solver ) + T der ( n )
computer nodes. If these tasks are solved sequentially on
one computer, the total required CPU time is which has a form similar to Amdahl’s law.
Partitioning of the overall model may be performed
m
based upon the corresponding differential equations (4).
T( 1 ) = ∑ τi
i=1
(1)
For example, (4) can be subdivided into subsystems with
roughly the same number of state variables and/or float-
where τ i is the absolute time required to complete the ing point operations. However, in the simulation of com-
plex power systems, the overall state model is rarely
i th task. On the other hand, if these tasks are distributed available in the form of (4). Alternatively, component-
among computer-nodes, the overall computing time can based model subdivision may be used. In many large
be reduced. In this case, however, additional time is models, components such as machines, converters, dis-
required to communicate the variables among the net- tribution networks, loads, etc., can be clearly identified
work nodes, and it can be expected that the overall per- and treated as interconnected dynamical subsystems.
formance of an n -computer network will be limited by Large transmission and distribution networks can also be
the node that has the slowest compute-plus-communica- subdivided at convenient topological boundaries into
smaller subnetworks. In this paper, it is assumed that the to any of the nodes; and C com is the total time required
output variables can be calculated directly in terms of the
by the master node for communication with all of the
associated state variables. This ensures that each
server nodes. It should be noted that if the master node is
dynamic subsystem is strictly proper and that it is possi-
ble to compute the state derivatives without the need to not assigned a process, τ min is zero. Based on (2) and
solve an algebraic equation involving output variables (6), it is concluded that in order to achieve an improve-
from other subsystems. Fortunately, this condition is ment in computational speed, it is necessary that
often satisfied by the nature of the input-output relation- C com + τ max < T ( 1 ) (8)
ships among the various component models. It is impor-
tant to note that improper systems are occasionally However, if the communication latencies are too large or
encountered and it is important to be able to model such the individual sub-problems are too small, the DESD
systems. Fortunately, the proposed approach can be may actually require more time than that of a single-
extended to include systems with improper models and computer implementation. In particular, if after parti-
even algebraic loops (loops of improper models). The tioning the system, it is found that
details are significantly more involved and are described
T ( 1 ) < C com + τ min (9)
in [12]. Since the example system studied does not
include improper models, these details are not consid- then the DESD technique will yield decreased computa-
ered in this paper. tional speed. Based on (8) and (9), the speed of commu-
Once the overall model is partitioned into strictly nication plays an important role in determining whether
proper subsystems, the component models can be the proposed approach should be considered.
arranged so that the necessary coupling variables are cal- An important issue to be addressed is the assignment
culated locally using respective state variables that are of the partitioned subsystems and associated communi-
available from the master node. This arrangement is cation costs to the available network nodes in such a way
depicted in Fig. 2, wherein each subsystem communi- that the overall performance is optimal. For a given net-
cates only with the master node. Therein, τ i denotes the work with n computers and m subsystems, there exists
absolute CPU time required for computation of state a finite number of ways to divide the computational
derivatives and coupling variables of the i th subsystem, work among the network nodes. If the nodes are
and µ i is the combined two-way communication latency assumed to be identical and m > n , the total number of
distinct combinations can be found by evaluating the
of the i th subsystem, assuming the subsystem is imple- summation of Stirling numbers of the second kind [13].
mented on a computer different from the master node. For example, for the three-computer network and four
In order to establish an estimate of the expected com- subsystems, there are 14 distinct combinations. For six
putational performance, the communication latencies subsystems and the same computer network, the number
must also be taken into account. Depending upon the
of combination increases to 122. If the numbers n and
order in which the variables are sent to and received
from the server nodes, the total time needed to evaluate m are small, it is possible to establish the optimal
the entire vector of state derivatives will be bounded as assignment of tasks to the computer nodes by enumerat-
follows ing possible combinations and comparing the predicted
C com + τ min ≤ T der ( n ) ≤ C com + τ max (6) computational costs. However, this may not be practical
for larger problems.
where
On the other hand, some decisions can be made
n–1 immediately after analyzing Fig. 2. For example, if
C com = ∑ µi
i=1
(7)
µ i > τ i , the i th subsystem should be implemented on
the master node. In this case, the absolute time spent by
Since more than one subsystem may be placed on any
the master node on this subsystem is τ i ; otherwise it
given computer node, τ min and τ max in (7) denote the
would be µ i + τ i . After all subsystems for which µ i > τ i
smallest and the largest total computing times distributed
are placed on the master node, the number of remaining
subsystems should be compared with the number of
τ1 τ2 τi τm
server nodes, which is n – 1 . If the number of remaining
subsystems for which µ i ≤ τ i is smaller or equal to
µ1 µ2 µi µm
n – 1 , each of these subsystems could be placed on its
own server node, which in turn would constitute an opti-
master
ODE Solver mal assignment. On the other hand, if the number of sub-
node
systems for which µ i ≤ τ i is greater then n – 1 , a more
Figure 2: Computation of subsystems.
systematic approach is needed.
It is also known that there is no polynomial-time algo- 3 COMPUTER STUDY
rithm capable of finding the optimal solution to the gen- In order to demonstrate proposed approach, the
eral case of the multiprocessor scheduling problem (NP- WSCC three-machine nine-bus system depicted in Fig. 3
complete) [14]. The design of algorithms that give sub- is considered. This system is described in [18] and [19],
optimal solutions for a particular form of this problem in and is often used in the literature as a benchmark system
a reasonable amount of time represents a research. For for evaluating different simulation techniques. The
example, it is possible to consider τ i and µ i shown in machine parameters are summarized in Table 1. It is
Fig. 2 as weights corresponding to respective vertices assumed that all three generators utilize the same excita-
and edges and address this problem using weighted tion system whose parameters are given in Table 2. The
graphs and dynamic programming algorithms [15]. system depicted in Fig. 3 was implemented using a qd -
Nonetheless, for the example system considered in representation of the synchronous machines as well as
this paper, enumeration of the all possible assignments the network. The simulation includes the stator and net-
followed by immediate elimination of the obvious “bad” work electrical transient (using lumped parameter trans-
assignments as described above, provides a realizable mission line models), the rotor winding electrical
approach of optimum task scheduling. In order to imple- transients, machine excitation systems, and the lumped-
ment the proposed approach, it is necessary to first inertia mechanical dynamics. The equations describing
the synchronous machines and associated excitation sys-
establish the communication latencies µ i and the com- tems were taken from [18] and therefore are given here.
putational complexities of each task τ i . Each of these The transmission network was modeled using an
automated state-variable formulation [20], [21]. Using
parameters depend upon numerous factors including, the this approach, the transmission network was represented
CPU speed, the type of communication network, etc.
as q - and d - equivalent circuits coupled through the
Because of the variety of computer systems and net-
works currently available, the determination of µ i and
Parameter Generator 1 Generator 2 Generator 3
τ i is easiest and most accurately accomplished by direct
Rated MVA 247.5 192.0 128.0
measurement. Type Hydro Steam Steam
If computers are connected using TCP/IP, the network ω r , rpm 180 3600 3600
is reliable and the computers need not be located in close H , s. 23.64 6.4 3.01
proximity to one another. However, the communication xd 0.146 0.8958 1.3125
latency may be rather large and vary with network traf- x' d 0.0608 0.1198 0.1813
fic. In this regard, a more specialized network with lower xq 0.0969 0.8645 1.2578
communication latency may be a better choice. There- x' q 0.0969 0.1969 0.25
fore, the SCI adapter cards [16] were used to set up a
xl 0.0336 0.0521 0.0742
network of three 400-MHz Pentium-II personal comput-
ers. Based on the SCI software [17], a library of subrou- Tdo' 8.96 6.0 5.89
tines implementing a message-passing interface (MPI) Tqo' 0.31 0.535 0.6
between the network nodes was developed. These sub- Kfric 4.0 3.0 3.0
routines are compiled into one static library that can be
linked with other applications. The developed library has Table 1: Generator parameters in pu on a 100-MVA base.
been tested with ACSL as well as with stand-alone appli- Gen 2 Gen 3
Z = j0.0625 Z = 0.0085 + j0.072 Z = 0.0119 + j0.1008 Z = j0.0586

cations written in C. For networks based on MPI, the 18.0 kV
1.025 pu Y = j0.0745 Y = j0.1045

13.8 kV
1.025 pu
communication latency is often assumed to depend lin-

Z = 0.032 + j0.161
early on the data size [11]. A simple C routine was writ-

Z = 0.039 + j0.17
Tap = 18.0 Tap = 13.8

163 MW 85 MW
ten to measure the communication latency for the 230 230

Y = j0.179
Station C
Y = j0.153
230 kV
previously described network. Based on measurements, 100 MW
35 MVAR
the two-way communication latency between two nodes
Z = 0.017 + j0.092
of the given network may be approximated as

Z = 0.01 + j0.085
Station A Station B
t com = ( 0.402 ⋅ d + 23.206 )µs

230 kV 230 kV
Y = j0.088
Y = j0.079
(10)
125 MW 90 MW
50 MVAR 30 MVAR
where d is the number of double precision variables,

and the constant coefficient of 23.206 µs is the over- Tap = 16.5 Z = j0.0576
230
head time associated with function calls and internal
interrupts. Likewise, after the independent tasks are Gen 1
Slack Bus 16.5 kV

identified, the associated computational complexities 1.04 pu
can be established by direct measurement using the pre-

Figure 3: WSCC three-machine nine-bus test systems.
viously described computer network.
Although there exists 122 different ways to imple-
Parameter Ka Ta , s. K e T e , s. Kf Tf , s. ment six subsystems using three computers, the number
of practical cases is significantly less. In particular, since
Value 20 0.2 1.0 0.314 0.063 0.35
for subsystems G-1, G-2, and G-3 the communication
Table 2: IEEE-Type I exciter parameters times are greater than the computation times, all of these
component models should be placed on the master node.
respective speed-voltage and speed-current terms [22]. The remaining subsystems SNW-1, SNW-2, and SNW-3
The respective topology and branch numbering are have the same computational complexities as well as
shown in Fig. 4. The branch parameters were established communication times, from which it can be verified that
from the data given in Fig. 3. placing any two of them on one computer node will
increase the overall simulation time. Therefore, the three
An apparent partitioning of the WSCC system is to
remaining subsystems must be placed on separate nodes.
have the machines and the distribution network as four
Thus, it was found that the most advantageous arrange-
separate components. The measured CPU times required ment is to implement subsystems G-1, G-2, G-3, and
to evaluate the state derivatives and coupling variables SNW-1 on the master node, which performs the integra-
on a 400-MHz Pentium II computer are summarized in tion of the overall state vector. The remaining two sub-
Table 3 for all subsystems including the non-partitioned networks were implemented, respectively, on the two
transmission network. If the transmission network is remaining server nodes. The overall partitioning is sum-
implemented as one component, the computing and marized Table 4.
communication times given in Table 3 do not satisfy (8),
which implies that DESD will not produce an improve-
derivative function, µs
ment in computational speed. In such cases, when the
communication, µs
exchange variables
Time of one call to

coupling variables
Time of two-way
model components have vastly different computing
Total number of
Total number of
Total number of
state variables
times, the component with the largest time may be parti- Component
tioned into yet smaller subsystems. In the WSCC test
system, it is possible to divide the transmission network
into three subsystems as shown in Fig. 4. These subnet-
Entire
works have an equal number of branches, similar topol- Transmission 42 6 48 317 42.49
ogy, and therefore equal computational complexity. Network
Subnetworks
Additionally, since all three generator-exciter systems SNW-1, 14 6 20 64.6 31.24
are represented by the same set of equations (with differ- SNW-2,
ent parameters), the corresponding models also have the SNW-3
same computational complexity. Although the overall Generator-
Exciter 9 2 11 2.64 27.63
system can be divided in numerous ways, the compo- Subsystems
G-1,G-2,G-3
nent-based partitioning shown in Fig. 4 was found to be
the most natural from an input-output perspective. As a Table 3: Computational characteristics of the subsystems.
check, it can be verified that the final six-component par-
titioning satisfies (8) indicating that there is a potential
communication time, µs
for improvement in speed.
Total time to compute
Total communication
Total compute-plus-
time per call, µs
Communication
derivatives, µs
B1 B2 B3 B4 B5 Computer
Implemented
subsystems
b2 b4 b6 b8 node
variables
SM2 SM3
b1 b3 b5 b7 b9
E2 E3
b10
b18 b19
G-2 SNW-3 G-3
B6 b11
SNW-2
SNW-1
1, Master G-1, 72.52 2 ×20 62.484 135.004
b20 G-2,
b21
SM - Synchronous
G-3
B9
Machine
b17
B7
2, Server SNW-2 64.6 20 31.242 95.842
b16 b12 E - Exciter
B - Bus 3, Server SNW-3 64.6 20 31.242 95.842

b - branch
b19, b20, and b21

SNW-1 b13
b15
b14
are loads
Table 4: Partitioning of the WSCC system.
B8 SNW - Subnetworks
G - Generator-Excitor
E1 G-1
Subsystem
In order to analyze the simulation speed, the follow-
SM1
ing computer study was considered. The model was

started with initial conditions corresponding to steady
Figure 4: Implementation of the WSCC test systems.
state when generators 2 and 3 had mechanical (input)
torques set to 1.65 and 0.85 pu, respectively. Generator 1
2.0
was implemented as a slack bus that balances the power
Generator 3
Tem, (pu.)
flow within the system. At t = 2.0 s , the mechanical 1.0
torque of generator 2 is decreased to 0.5 pu. At

0
t = 5.0 s , the torque is reset back to 1.65 pu and the
2.0
model continues to run until t = 8.0 s . The correspond-
Generator 2
Tem, (pu.)
ing transients in electromagnetic torque of each genera- 1.0
tor are shown in Fig. 5. It can be seen that generator 1

0
acts as a slack bus by providing additional power
2.0
between t = 2.0 s and t = 5.0 s ; whereas during the
Generator 1
Tem, (pu.)
same time generator 3 undergoes a slight transient. The 0
same study was performed on one computer as well as

-2.0
on a three-computer network using multi-step Gear’s and 0 1.6 3.2 4.8 6.4 8.0
Time, s.
second-order Runge-Kutta-Fehlberg integration algo-
rithms. The corresponding numerical statistics are sum- Figure 5: Electromagnetic torque transients.
marized in Tables 5 and 6, respectively.
the entire system was also implemented on one com-
puter. The measured CPU time for evaluating state deriv-
on derivative evaluation, s.
atives was 324.92 µs . It can be noted that this time is

Calculated total time spent
on other calculations, s.
somewhat greater than the sum of computing times of

to derivative function
Total number of calls
Cost of each call, µs.
Estimated time spent
the six subsystems. This is attributed to the nonlinear

Measured overall
Implement- complexity of [20], which makes a difference when the

CPU time, s.
ation transmission network is divided into smaller subsystems.

Thus, an overall 2.407 times improvement in T der was
achieved.
It is important to note that according to (5), the
1-computer 18,028 324.92 5.858 7.540 1.682
improvement in T der does not result in the same increase
3-computer
18,009 135.004 2.431 4.336 1.905
DESD in the overall simulation speed. Moreover, the overall
Improvement 2.407 2.409 1.739 improvement in speed depends upon the integration
algorithm used. The data given in Tables 5 and 6 is con-
Table 5: Numerical statistics using Gear’s algorithm. sistent with the previous statements. In particular, when
the high-order multi-step Gear’s algorithm is used in the
one-computer implementation, approximately 22% of
on derivative evaluation, s.
Calculated total time spent
time is spent on internal serial calculations. Generally, in

on other calculations, s.
higher-order methods, more relative time is spent on

to derivative function
Total number of calls
Cost of each call, µs.
Estimated time spent
serial calculations needed to solve algebraic equations

Measured overall
Implement- and update coefficients. This explains the fact that the
CPU time, s.
ation 2.407 times improvement in T der produces only a 1.739

increase in simulation speed. When the second-order
Runge-Kutta-Fehlberg method is used in the one-com-
1-computer 1264,271 324.92 410.787 431.069 20.282 puter implementation, the relative time spent on serial
3-computer calculations is less than 5%. Therefore, the overall
1264,199 135.004 170.672 208.892 38.220
DESD improvement in speed is 2.064. It should also be noted
Improvement 2.407 2.407 2.064 that since the example system is relatively stiff, the num-
ber of integration steps taken by each solver is dramati-
Table 6: Numerical statistics using R-K-F algorithm. cally different. However, since the predominant time is
spent on calls to the derivative function in both integra-
4 DISCUSSION tion algorithms, the DESD technique yields a significant
Based on the data in Table 4, the predicted time for improvement in computational speed in both cases.
evaluating the state derivatives for the overall system
implemented on three computers is 135.004 µs . The 5 SUMMARY
measured CPU time was observed to be within one When using state-variable techniques to model power
microsecond of the predicted value. This time is deter- systems, the corresponding differential equations are
mined as the longest computing-plus-communication typically solved using established ODE solvers. Since
time among the subsystems. For comparison purposes, for many ODE solvers, the time needed to evaluate the
state derivatives often dominates other calculations, the value problems,” Journal of Computational and
distributed evaluation of state derivatives approach con- Applied Mathematics, 60, p. 309-329, 1995.
sidered in this paper can be used to improve the overall [8] P. J. van der Houwen, E. Messina, “Parallel Adams
simulation speed. In this approach, the integration of the methods,” Journal of Computational and Applied
overall system is performed on one computer (master), Mathematics, 101, p. 153-165, 1999.
while the state derivatives are computed in parallel on
the other computers (servers). In order to demonstrate [9] K. Burrage, Parallel and Sequential methods for
the computational gain that can be achieved, a detailed Ordinary Differential Equations, Oxford Press,
computer simulation of the WSCC three-machine nine- 1995.
bus system was implemented on a network of three per- [10] L. Pollini and M. Innocenti, “A Synthetic Environ-
sonal computers interconnected using SCI network tech- ment for Dynamic Systems Control and Distributed
nology. It was shown that the ODE-solver-dependent Simulation,” IEEE Control Systems Magazine, p.
serial calculations and the network-dependent communi- 49-60, April 2000.
cation latencies are important in determining the result- [11] B. Wilkinson, M. Allen, Parallel Programming:
ing improvement in simulation speed. For the given Techniques and Applications Using Networked
three-computer network, a 173.9% increase in simula- Workstations and Parallel Computers, Prentice
tion speed was obtained using the variable-order Gear’s Hall, 1999.
ODE solver while a 206.4% increase in speed was
[12] N. Mohd Noor, “Distributed Simulation of Electri-
achieved using the second-order Runge-Kutta-Fehlberg
cal Power System,” M.S.E.C.E. Purdue University,
method. Although the example system is relatively sim-
May 2001.
ple when compared to actual power systems, these
results demonstrate the significant potential of the pro- [13] D. L. Kreher, D. R. Stinson, Combinatorial Algo-
posed approach. Further work is presently underway to rithms: Generation, Enumeration, and Search, CRC
apply this approach to larger systems. Press 1999.
[14] Lee Chung-Yee, J. D. Massey, “Multiprocessor
6 ACKNOWLEDGEMENTS Scheduling: An Extension of the MULTIFIT Algo-
rithm,” Journal of Manufacturing Systems, Vol. 7,
This research was supported by the Air Force No 1, p. 25-32, 1988.
Research Laboratory through PC Krause and Associates,
[15] T. H. Cormen, C. E. Leiserson, R. L. Rivest, Intro-
Inc. under Contract F33615-99-C-2911 SBIR Phase II,
duction to Algorithms, McGraw-Hill, MIT Press,
“Multi-Level Heterogeneous Modeling of F22 Electrical
1992.
Power System.”
[16] Dolphin Interconnect Solutions, “PCI-SCI Adapter
Card for System Area Networks,” (available from
REFERENCES
www.dolphinics.com).
[1] Alternative Transients Program / Electro Magnetic [17] Dolphin Interconnect Solutions, “Low-level SCI
Transients Program (ATP/EMTP), June 2001. Software Functional Specification,” Esprit Project
(available at www.emtp.org) 23174 - Software Infrastructure for SCI, Version
[2] Advanced Continuous Simulation Language 2.1.1 - March 15, 1999. (available at www.dolphin-
(ACSL), Reference Manual, Version 11, MGA ics.com).
Software, Concord, Massachusetts, 1995. [18] P. W. Sauer, M. A. Pai, Power Systems Dynamics
[3] Simulink: Dynamic System Simulation for Matlab, and Stability, Prentice Hall, 1998.
Using Simulink Version 3, The MathWorks Inc., [19] Power Systems Dynamic Analysis, Phase I, EPRI
1999. Report EL-4484, Electric Power Research Institute,
[4] EASY 5 User Guide, The Boeing Company, 1999. July 1971.
[5] M. L. Crow, M. Ilic, “The Parallel Implementation [20] O. Wasynczuk, S.D. Sudhoff, “Automated State
of the Waveform Relaxation Method for Transient Model Generation Algorithm for Power Circuits
Stability Simulations,” IEEE Transactions on and Systems,” IEEE Transactions on Power Sys-
Power Systems, Vol. 5, No. 3, p. 922-932, Aug. tems, Vol. 11, No. 4, p. 1951-1956, Nov. 1996.
1990. [21] J. Jatskevich, O. Wasynczuk, L. Conrad, “Method
[6] Nguyen huu Cong, “A parallel DIRK method for of Evaluating Flicker and Flicker-Reduction Strate-
stiff initial value problems,” Journal of Computa- gies in Power Systems,” IEEE Transactions on
tional and Applied Mathematics, 54, p. 121-127, Power Delivery. Vol. 13, No. 4, p. 1481-1487, Oct.
1994. 1998.
[7] P. J. van der Houwen, B. P. Sommeijer, W. A. Van [22] P. C. Krause, O. Wasynczuk, S. D. Sudhoff, Analy-
der Veen, “Parallel iterations across the step of sis of Electric Machinery, IEEE Press, Piscataway,
high-order Runge-Kutta methods for nonstiff initial NJ, 1995.

Distributed Simulation of Electric Power Systems

Uploaded by

Copyright:

Available Formats

Distributed Simulation of Electric Power Systems

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Distributed Simulation of Electric Power Systems

Uploaded by

Copyright:

Available Formats

14th PSCC, Sevilla, 24-28 June 2002 Session 33, Paper 2, Page 1

DISTRIBUTED SIMULATION OF ELECTRIC POWER SYSTEMS

J. Jatskevich1, O. Wasynczuk1, N. Mohd Noor1, E. A. Walters2, C. E. Lucas1, and P. T. Lamm3

In a state-variable-based simulation, the overall CPU

Z = j0.0625 Z = 0.0085 + j0.072 Z = 0.0119 + j0.1008 Z = j0.0586

1.025 pu Y = j0.0745 Y = j0.1045

communication latency is often assumed to depend lin-

early on the data size [11]. A simple C routine was writ-

Tap = 18.0 Tap = 13.8

ten to measure the communication latency for the 230 230

previously described network. Based on measurements, 100 MW

of the given network may be approximated as

t com = ( 0.402 ⋅ d + 23.206 )µs

where d is the number of double precision variables,

Slack Bus 16.5 kV

can be established by direct measurement using the pre-

Time of one call to

B - Bus 3, Server SNW-3 64.6 20 31.242 95.842

b19, b20, and b21

ing computer study was considered. The model was

torque of generator 2 is decreased to 0.5 pu. At

tor are shown in Fig. 5. It can be seen that generator 1

same study was performed on one computer as well as

atives was 324.92 µs . It can be noted that this time is

somewhat greater than the sum of computing times of

Cost of each call, µs.

Estimated time spent

the six subsystems. This is attributed to the nonlinear

Implement- complexity of [20], which makes a difference when the

ation transmission network is divided into smaller subsystems.

time is spent on internal serial calculations. Generally, in

higher-order methods, more relative time is spent on

Cost of each call, µs.

Estimated time spent

serial calculations needed to solve algebraic equations

ation 2.407 times improvement in T der produces only a 1.739

You might also like