Deterministic Many-to-Many Hot Potato Routing
Allan Borodin
∗
Yuval Rabani
†
Baruch Schieber
‡
Abstract
We consider algorithms for many-to-many hot potato routing. In hot potato (deflection)
routing a packet cannot be buffered, and is therefore always moving until it reaches its
destination. We give optimal and nearly optimal deterministic algorithms for many-tomany packet routing in commonly occurring networks such as the hypercube, meshes and
tori of various dimensions and sizes, trees and hypercubic networks such as the butterfly.
All these algorithms are analyzed using a charging scheme that may be applicable to other
algorithms as well. Moreover, all bounds hold in a dynamic setting in which packets can
be injected at arbitrary times.
∗
Dept. of Computer Science, University of Toronto, Toronto, Canada. This work was done while this author
visited the Departments of Computer Science at The Hebrew University, Jerusalem and The Weizmann Institute
of Science, Rehovot, Israel. E-mail: borodin@cs.toronto.edu
†
Most of this work was done while this author was at the Lab. for Computer Science, MIT, Cambridge, MA,
supported by ARPA/Army contract DABT63-93-C-0038, and at the Dept. of Computer Science, University
of Toronto, Toronto, Canada. Present address: Computer Science Dept., The Technion, Haifa 32000, Israel.
Work at the Technion supported by a David and Ruth Moskowitz academic lecturship award, and by a grant
from the fund for the promotion of sponsored research at the Technion. E-mail: rabani@cs.technion.ac.il,
URL: http://www.cs.technion.ac.il/~rabani/
‡
IBM T.J. Watson Research Center, Yorktown, NY. E-mail: sbar@watson.ibm.com
1
Introduction
This paper studies routing in a synchronous network, in which at most one packet can traverse
any link in each time step. We consider a form of routing known as hot potato routing or
deflection routing [1, 5, 8, 9, 10, 11, 12, 13, 15, 20, 21]. The striking feature of this form of
routing is that unlike traditional store and forward packet routing, it involves no queues at
intermediate nodes. Thus packets are always moving, giving rise to the term “hot potato.” A
packet attempts to travel “towards” its destination. However, due to congestion, two or more
packets may arrive at a node and wish to exit it on the same outgoing link. In the event of such
contention, the node forwards one of the packets along the preferred edge. The other packet(s)
will be “deflected” away from the preferred direction, exiting the node along other link(s). In
particular, some packets may temporarily move further away from their destinations. This
paper focuses on the general problem of many-to-many routing where an arbitrary number of
packets can originate at any network node and an arbitrary number of packets can be destined
for any node. In the context of hot potato routing, it is assumed that the number of packets
originating at a node does not exceed the degree of the node. Moreover, we will consider
the routing to take place in the following dynamic context: an adversary constructs both a
many-to-many routing problem with k packets and a set of injection times for each of the
packets (subject to the constraint that packets are not injected so as to exceed the degree of
the node). Note that since all algorithms considered here are deterministic, this is equivalent
to the situation where an adversary injects packets as it observes the routing process. In this
setting we will analyze the number of steps (following its injection) it takes any packet to
arrive at its destination as a function of k and the initial origin to destination distance that
the packet has to travel.
We give optimal and nearly optimal deterministic algorithms for many-to-many packet
routing in commonly occurring networks such as the hypercube, meshes and tori of various
dimensions and sizes, trees and hypercubic networks such as the butterfly. All these algorithms
are analyzed using a charging scheme that may be applicable to other algorithms as well.
Moreover, all bounds hold in the dynamic setting.
Three key reasons for favoring deflection routing are: (i) potential faster switching since
no messages are buffered at intermediate nodes, (ii) deflections disperse the packets adaptively
and may afford the possibility of avoiding “hot spots,” and (iii) the elimination of queues can
reduce the price of switching hardware. Because of these reasons deflection routing was used in
parallel machines such as the HEP multiprocessor [17], as well as high-speed communications
networks [15], especially in optical networks [1, 9, 15, 20], where buffering involves transforming
the packets into electronic media.
Although it has been advocated for thirty years (see Baran [3]), few papers have attempted
any precise analysis for hot potato schemes, while experimentally these algorithms seem to
work exceptionally well [1, 9, 10, 13, 15, 16]. One of the reasons for this may be the fact that
the (usually adaptive) dispersion of packets in hot-potato routing seems to make the analysis
quite difficult. Feige and Raghavan [8] resurrected interest in this topic within the theoretical
1
computer science community by analyzing average case and randomized hot potato algorithms
for the torus and hypercube networks. They also called attention to an important paper by
Hajek [11] concerning worst case hot potato routing for arbitrary (i.e., many-to-many) routing
problems.
Mansour and Patt-Shamir [14] considered the general problem of many-to-many routing of
k packets in a store and forward manner. They were able to show that any greedy store and
forward algorithm that only uses shortest paths will route packets with arbitrary origins and
destinations so that each packet p arrives in at most dist (p) + k − 1 steps, where dist (p) is
the length of the shortest path from the source of p to its destination. A greedy algorithm in
this setting is an algorithm that would always forward a packet if there is an available link on
some shortest path to the destination. One can note that achieving a multiplicative bound of
dist (p) · k is rather obvious, but the additive bound they derive is non-trivial. This bound
is also optimal in the sense that there are cases (for example, when all the packets have the
same destination) when no better bound is possible. Their analysis is easily seen to hold in
our dynamic setting.
Although independent and not motivated by the Mansour and Patt-Shamir [14] result,
Hajek [11] gave a similar result for hot potato routing. In the case of store and forward
routing, an edge conflict results in the buffering and thereby delaying of a packet by one step.
In the case of hot potato routing on an undirected network an edge conflict may result in a
deflection that may delay a packet by two steps. Hajek [11] (and later Brassil and Cruz [6])
considered two cases: (i) many-to-one routing in arbitrary networks (i.e., the special case
when all packets have the same destination), and (ii) many-to-many routing in the hypercube
network. For both cases Hajek was able to devise a routing algorithm that guarantees delivery
in at most δN + 2(k − 1) steps, where δN is the diameter of the network and k is the total
number of packets. Moreover, this bound is achieved by any hot potato algorithm that gives
priority to packets based on the present distance to their destination; i.e., packets with shortest
distance to go have priority. Once again, we observe that achieving a bound of δN · 2(k − 1)
is quite easy for such schemes but Hajek’s proof is relatively subtle. In fact, it is not at all
clear that Hajek’s analysis can be applied in the dynamic setting. The analysis by Brassil and
Cruz [6] can be used to obtain a dist (p) + 2(k − 1) bound for hypercube, improving upon
the δN + 2(k − 1) bound in Hajek [11].
Recognizing that Hajek’s analysis does not extend to the mesh, Ben-Dor, Halevi and Schuster [5] were the first to give many-to-many algorithms for the n × n two-dimensional mesh and
also for higher dimensional meshes. For example, they give a somewhat complicated “greedy”
√
algorithm for the n × n two-dimensional mesh that routes k packets within O(n k) steps. The
bound for a d-dimensional, nd node mesh is O(exp(d)nd−1 k 1/d ), where k is the total number of
packets. When k is of the order of the size of the network these bounds coincide asymptotically
with O(dist (p) + k) bounds, but clearly are not as good when k is smaller.
Both the Mansour and Patt-Shamir [14] and the Hajek [11] results are derived by and can
be interpreted as saying that contentions (deflections) can be charged in such a way that no
2
packet delays another packet more than once. The implicit and still unrealized goal of Hajek
was to find a simple (and we might add easy to implement) class of hot potato algorithms
for which the bound dist (p) + 2(k − 1) holds for all undirected networks. (We remark that
Hajek had a more general formulation applying to directed networks but we choose to restrict
attention to undirected networks; in a directed network a deflection might cause a packet to
be moved more than one edge away from its destination. The Mansour and Patt-Shamir result
does apply to directed networks.) Hajek [11] gave an example showing that a natural candidate
for such a class of algorithms (namely, algorithms that use the shortest distance to go priority
rule) will not work for all networks.
Before listing our results in detail we discuss an important point in evaluating hot potato
routing algorithms: their simplicity and “practical appeal”. In order to capitalize on the
potential of shorter switching time, as well as to further reduce the price of switching hardware,
it is important that the routing choices made by a node at each step are simple. Ideally, the
resources used by the routing algorithm — time, extra memory, randomness, communication
— can be quantified precisely and optimized. A more practical approach, taken in essentially
all of the routing literature, is to specify certain features of these algorithms that may imply
their simplicity. We list some features from the least restrictive feature to the most restrictive.
The first feature is locality: the assignments of packets at a node to outgoing links should
depend only on data available at the node. Another feature is the one-pass of packets property
defined by Hajek [11]: for each node and each time step, there is an ordering of the packets in
the node according to some priority rules, so that a packet is deflected only if all outgoing links
on shortest paths to its destination are assigned to packets before it in the ordering. Although
not explicitly defined in [11] we assume that the priority of a packet has to depend only on its
source, destination, its current node, and the link it arrived on. The most restrictive feature
is the one-pass of links property defined by Feige and Raghavan [8]: for each node there is
a fixed ordering of the incoming links, so that the incoming packets are always scanned and
routed in this order. The last two properties can be extended to respective `-pass properties.
For example, in an `-pass of links algorithm the links are scanned at most ` times and in each
pass some packets get routed with all packets routed by the end of the `th pass. We assume
that the only information obtained from previous passes is which outgoing links have already
been assigned.
Adhering to these principles does not guarantee the intended simplicity. For instance, a
node in a local algorithm may store the entire history of its previous decisions, and packets
may contain the entire history of the nodes they passed on their way. However, typical local
algorithms do not store information from round to to round, and typical packet headers do
not contain significant data on its history. Similarly, a one-pass of links algorithm might
perform very complicated calculations and store considerable data as it passes among the
links. However, one usually perceives a one-pass of links algorithm as storing no additional
data and performing O(1) operations per link. A one-pass of packets algorithm seems to involve
storing the priority for each packet and sorting according to the priority, so it would require
more operations and additional storage compared with a simple one-pass of links algorithm.
3
Routing algorithms designed to perform well in the worst case should have “practical appeal”. Again, it is hard (or even harder) to quantify this feature of the algorithm. We put some
emphasis on “greedy” algorithms, agreeing with Ben-Dor, Halevi and Schuster [5] that such
algorithms have the “practical appeal” desired by practitioners. Motivated by the discussion
in Ben-Dor et al. [5] and similar to the definitions in Ben-Aroya, Eilam, and Schuster [4], and
Feige [7], we distinguish between two types of greediness. An algorithm is partially greedy if it
always assigns a packet an outgoing link on one of the shortest paths to its destination if such
a link is available. Note that this property is local in the sense that an algorithm may assign a
link e1 to one packet and deflect another packet on link e2 , even if switching the links among
the packets would avoid the deflection. An algorithm is totally greedy if it assigns packets at
each node to outgoing links so as to minimize the number of deflections. Note that usually a
“simple” algorithm would not possess the global property since this would require a maximum
matching between packets and good links.
Results. In Section 2 we use a class of deflection arguments to give a general charging
scheme that can be used for the analysis of hot potato routing algorithms. In this scheme
we show how a certain property of the algorithm ensures that each deflection of a packet p
would be charged to a different packet. We demonstrate the applicability of this scheme by
showing simple algorithms achieving the dist (p) + 2(k − 1) bound for many-to-many routing
in tree networks, the butterfly network, and lightly loaded two-dimensional meshes and tori.
All these algorithms but the last satisfy the one-pass of links property. The algorithm for
lightly loaded two-dimensional meshes and tori satisfies the two-pass of links property and
one-pass of packets property. Feige [7] independently shows that for trees every totally greedy
algorithm achieves the dist (p) + 2(k − 1) bound, and then shows how to use the result to
achieve a bound of 2(ρN + k − 1) for any network N , where ρN is the radius of the network
(i.e., minv maxu dist (u, v)). A similar but somewhat weaker result of 2(δN + k − 1) was
independently given by Symvonis [18]. The 2(ρN + k − 1) bound is not necessarily achieved by
a totally greedy algorithm and indeed Feige [7] shows that for some networks, totally greedy
algorithms using simple priority rules to avoid livelock will result in significant (i.e., exponential
in k) delay to a particular packet.
The main algorithmic results of this paper are in sections 3 and 4. In these sections we
consider many-to-many routing algorithms for multi-dimensional meshes and tori. In Section 3
we give an algorithm that achieves the bound of dist (p) + 4(k − 1). Note that since the
hypercube is a log N -dimensional mesh this result applies also to the hypercube. The algorithm
can be implemented to satisfy either the one-pass of packets property or the three-pass of links
property. The algorithm has the additional feature that when applied to permutation routing in
an n × n two-dimensional mesh it maintains the O(n1.5 ) bound given in Bar-Noy et al. [2]. The
advantage of our algorithm is that it also applies to many-to-many routing. We conjecture that
the correct bound for this algorithm is the desired dist (p) + 2(k − 1). Feige [7] has analyzed
a variant of our algorithm and shows that it achieves the bound dist (p) + 2k for meshes and
tori of all dimensions. It is also tempting to conjecture that these algorithms might be optimal
4
(i.e., achieve worst case O(n) for routing permutation) but Symvonis [19] has shown this to be
false for dimensions bigger than two.
In section 4 we exhibit a more complicated, but partially greedy, algorithm that achieves
the desired dist (p) + 2(k − 1) bound. We were not able to implement this algorithm in a
bounded number of passes of packets (and thus also not in a bounded number of link–passes).
A modification of this algorithm applied to the two-dimensional case achieves the desired
dist (p) + 2(k − 1) bound and also maintains the O(n1.5 ) bound given in [2] for permutation
routing. For the two-dimensional case, Ben-Aroya, Eilam and Schuster [4] independently
obtained an equivalent algorithm. As Feige [7] observes, the dist (p) + 2(k − 1) bound is
somewhat arbitrary in that this bound is not a lower bound for all networks (e.g., a star
network). For the mesh and, in particular, for the two-dimensional mesh, it is not known if
the dist (p) + 2(k − 1) bound is necessary. If all packets are being routed to one destination
then Ben-Aroya et al. prove that all packets can be routed in maxp dist (p) + k, but it is not
clear that the single destination case is a worst case.
2
The General Charging Scheme
In this section we present a general charging scheme for analyzing deflection routing algorithms.
Consider a packet p that was deflected at time t1 by packet p1 . (Packet p may be deflected
by more than one packet in which case we are free to choose p1 to ensure certain desirable
properties.) Define a deflection sequence and a deflection path with respect to this deflection
as follows. Follow packet p1 starting at time t1 either to its destination or up to time t2 > t1 .
Suppose that at time t2 packets p1 and p2 meet. (Note that we are free to choose the time t2
and the packet p2 .) Follow p2 from time t2 either to its destination or until some time t3 > t2 ,
when it meets p3 . Then follow packet p3 . Continue in the same manner until a packet p`
is followed to its destination. Define the sequence of packets: p1 , p2 , . . . , p` as the deflection
sequence of p at time t1 . Define the path that follows this sequence of packets from the point
of deflection to the destination of p` to be the deflection path. The following claim is the key
to our charging scheme.
Claim 1 Suppose that for any deflection of packet p from node v to node u the shortest path
from node u to the destination of p` (the last packet in the deflection sequence) is at least as
long as the deflection path. Then, p` cannot be the last packet in any other deflection sequence
of packet p. Consequently, we can “charge” the deflection to packet p` .
Proof. Suppose that p` is the last packet in more than one deflection sequence of packet
p. Suppose that two such deflections occurred at nodes v1 and v10 at times t1 and t01 where
t1 < t01 . In order for p` to be the last packet in both sequences both deflection paths must end
at the destination of p` . Moreover, the length of the path from v10 to this destination has to
be t`+1 − t01 (where t`+1 is the time when p` reached its destination). Consider the following
path from u1 (the node to which p was deflected from v1 ) to the destination of p` . The first
part of the path follows packet p from u1 to v10 . The last part follows the deflection path of p
5
at time t01 . The length of this path is t01 − (t1 + 1) + t`+1 − t01 = t`+1 − t1 − 1 which is shorter
than the deflection path from v1 whose length is t`+1 − t1 ; a contradiction.
This claim provides a design principle for routing algorithms, as summarized in the following
corollary:
Corollary 2 If the deflection sequence for each of the deflections incurred by a routing algorithm satisfies the conditions of Claim 1, then the arrival time of each packet is bounded by
dist (p) + 2(k − 1), where dist (p) is the length of the shortest path from the source of packet
p to its destination and k is the number of packets.
We note that this claim applies also to the dynamic case.
In the rest of this section we give some simple applications of the Claim 1 for three cases:
trees, the butterfly network, and two-dimensional meshes.
First, consider any partially greedy algorithm on a tree. Suppose that a packet p is deflected
from node v to u. The deflection path is given by following the unique packet p1 that deflected
p (there is only one edge adjacent to v along which p can make progress) either until p1 reaches
its destination or until it is deflected by p2 in which case we continue by following p2 . Let p` is
the last packet in the deflection sequence. It is easy to see that the deflection path is a shortest
path from v to the destination of p` and that u is not in any such shortest path. Hence, the
condition of Claim 1 holds.
Now, consider end to end routing on the undirected butterfly network (that is, all packets
are injected at level 0 and all destinations are at the last level). We claim that any totally
greedy algorithm guarantees arrival time of dist (p) + 2(k − 1), for every packet p. Again,
we show that the condition of Claim 1 holds. Consider a packet p as it suffers a deflection at
a node v in level i of the butterfly. We claim that the deflected packet p wanted to proceed
to level i + 1. This is because the only case in which p would want to go from level i to level
i − 1 is if it was previously deflected from a node u in level i − 1 to level i. But in this case,
since the algorithm is totally greedy, packet p would not be deflected, but go back from v to
u in the next step. Consequently, we set the first packet in the deflection sequence to be the
packet p1 that advanced to level i + 1 along the edge that p wanted to use. The deflection
path follows the path taken by p1 , switching to p2 that deflects p1 , if that ever happens, and
so forth, until we reach p` that goes straight to its destination. The nodes along this path are
located in monotonically increasing levels, and therefore the path is a shortest path. Note that
because of the butterfly topology, no matter where packet p gets deflected it moves away from
p` ’s destination. We remark that an analogous result applies to the directed butterfly network.
Finally, consider the lightly loaded two-dimensional mesh or torus. (In the next section
we generalize the following result to the d-dimensional mesh/torus.) We will say that on the
n1 × n2 mesh (torus), a routing problem is “lightly loaded” if initially there is at most one
packet in any node in the boundary columns (in the case of the mesh) and at most two packets
in all other nodes (so that on the first step all packets can traverse along their origin row if
they so wish). A routing problem is fully loaded if the number of packets at any node is at
6
most the degree of the node (i.e., the maximum load consistent with hot potato routing).
We consider the following algorithm for the lightly loaded case, which is a modification of
the algorithm given in [2] for permutations. Packets traverse along their origin row attempting
to enter their destination column in either direction but preferably in the correct direction.
A packet already traveling in the correct direction on its destination column has the highest
priority and will never be deflected again. Packets traveling in their destination column have
priority over row packets attempting to enter the column. Packets traveling in a row towards
their destination column have priority over packets wishing to reverse their direction on this
row. Subject to these priority constraints a packet heading in the wrong direction (on either
a row or column) will reverse direction whenever possible. In particular, two packets traveling
in the wrong direction can meet at a node and both reverse directions.
Theorem 3 For any two-dimensional mesh or torus and for any lightly loaded dynamic routing
problem every packet p will be routed to its destination within at most dist (p)+2(k −1) steps.
Proof. In the lightly loaded case, a packet can only be deflected on its origin row (trying
to move towards or enter its destination column) or on its destination column (trying to move
towards its destination). Let us concentrate on p as it suffers a deflection at node v. If p
is deflected on its destination column or as it tries to enter its destination column then at
least one packet p1 at node v is heading directly for its destination. Packet p1 will not be
deflected in the future. The deflection path we use in this case is p1 ’s path, which is clearly the
shortest path to p1 ’s destination, and p gets deflected away from p1 ’s destination. If packet p
is deflected on its origin row while trying to move towards its column by packet p1 , then p1
is heading towards its destination column. We have two possibilities. The first possibility is
that p1 will enter its column in the correct direction. In this case we again use p1 ’s path as
the deflection path. The second possibility is that p1 is deflected by p2 which is heading in the
correct direction on this column. In this case our deflection path will switch to p2 . In either
case, the deflection path is a shortest path and p gets deflected away from end of this shortest
path. (In the case of the torus, p’s distance to the other end of the deflection path may remain
unchanged after the deflection.) Therefore, in all cases Claim 1 applies.
Notice that, in particular, any instance of permutation routing is lightly loaded. The above
argument shows that on the two-dimensional mesh or torus, our algorithm completes routing a
permutation within O(n2 ) steps. However, the O(n3/2 ) analysis that was given in [2] applies to
our algorithm as well and improves upon the many-to-many analysis. Of course, our analysis
applies to any many-to-many instance, not only permutations.
3
The dimension-by-dimension mesh algorithm
Consider the two-dimensional n1 ×n2 mesh/torus again. We modify the algorithm presented in
the previous section to handle the fully loaded case. In this case some packets may not be able
to enter their origin row and will be deflected into their origin column. These packets have the
7
lowest priority and attempt to enter any row. These packets attempt to move towards their
destination row, but they move into any receptive row, if possible. In this way a deflection
(away from the destination row) is well defined. As soon as a packet manages to get into a
row, it follows the algorithm for the lightly loaded case.
Theorem 4 In any fully loaded dynamic routing problem , any packet p will reach its destination within at most dist (p) + 4(k − 1) time steps.
Proof. In the fully loaded case, p might first be deflected onto its origin column. We will
have to show how to account for all of p’s deflections until it enters some row. (Once p enters a
row, the charging for deflections proceeds exactly as in the lightly loaded case.) Let us say that
while p is in its origin column it gets deflected at times t1 , t2 , . . . , t` and that these deflections
take place at rows r1 , r2 , . . . , r` . Since p could not enter row ri at time ti there must be some
packet pi which is heading in the correct direction along row ri . Notice that the times ti are
all distinct, but the rows ri and the packets pi are not necessarily all distinct. We will say that
p is deflected by pi at time ti but we will not necessarily charge pi for this deflection. Note
that pi might be deflected by some packet q as it tries to enter its destination column and this
same packet q might also deflect pi+1 and many other subsequent pj ’s. Consider the sequence
p1 , p2 , . . . , p` . It may be the case that a given packet appears many times on this sequence.
We want to account for deflections occurring in a subsequence pi , . . . , pj with pi = pj and then
effectively remove this subsequence (excluding pj ). Continuing in this way, we will end up with
a final sequence of deflections without any packet being repeated. At this point we can charge
the remaining deflections to the packets occurring in this final sequence. (It is these remaining
deflection charges which cannot be simply separated from the deflection charges that will be
incurred once the packet enters a row and hence each packet may be charged twice.)
Let ti be the first time that p is deflected by some pi which will again deflect p while it is
still in its origin column and let tj be the last time that pj = pi so deflects p. Obviously pi is
deflecting p at the same row ri = rj . It is also clear that there must be an 1-1 correspondence
between the deflections of p occurring at times ti , ti+1 , . . . .tj−1 and the deflections of pi that
are occurring in row ri , from time ti +1 to time tj −1. (This is necessary because p and pi meet
again at time tj .) Each of these deflections of pi can be charged to a unique packet exactly as
in the lightly loaded case. Indeed, these are the charges that are being incurred by pi during
this time period as it is trying to enter its destination column. We use the same charges for
the corresponding deflections of p.
We will now remove the subsequence pi , . . . , pj−1 . The remaining sequence may still have
packets occurring more than once and we want to apply the same procedure to the sequence
that remains. The only issue to worry about is that the charges will all be to distinct packets;
that is, after we remove a subsequence having charged certain packets, the same packets
cannot be charged again as we try to subsequently remove another subsequence. Suppose after
removing pi , . . . , pj−1 , we next wish to remove pu , ..., pv−1 . Clearly u > j since pj cannot equal
to pu as tj is the time of the last occurrence of pj in the deflection sequence. We claim that
any packet receiving a charge for one of the deflections in the first subsequence completes too
8
early to be “caught” by any deflection path that begins at time tu or later. Consider the last
time pi is deflected before time tj . As a result of this deflection pi is either deflected away
from its destination column, or remains in the same distance to its destination column. (The
latter may happen only in a torus.) Trace pi ’s path from this deflection until it last deflects p
at time tj , then trace p’s path until it is first deflected by pu at time tu , then trace pu ’s path
until any deflection it suffers at its destination column. This path is at least one step behind
any deflection path corresponding to deflections of pi before time tj .
We now wish to generalize this result to any number of dimensions of any size. (Clearly the
two-dimensional result did not depend on the size of the mesh.) In particular, our generalization
applies to the hypercube. We now describe this generalization to any number of dimensions
and once again distinguish between the lightly loaded case (where at most two packets are
injected at each node) and the fully loaded case. Say, we have a d-dimensional mesh and
arbitrarily order the dimensions, calling them dimensions 1, 2, . . . , d.
The lightly loaded case: A packet traverses along the dimension 1 until it is able to enter
dimension 2 in the proper 1 coordinate value and then traverses this dimension until it can
enter dimension 3 in the proper 2 coordinate value, etc. Once a packet enters a given dimension
it will stay in that dimension until it enters a higher dimension. More precisely, a packet now in
dimension i wishing to next fix dimension j will enter the highest dimension h with i ≤ h ≤ j
that it can enter. If the packet cannot enter dimension j in the correct direction then it is said
to be deflected. As before, once a packet enters dimension d it stays in this dimension until it
reaches its destination. And as before, packets which are deflected always attempt to change
direction as soon as possible.
Theorem 5 The above algorithm guarantees delivery time of dist (p) + 2(k − 1), for every
packet p in any lightly loaded dynamic routing problem on a d-dimensional mesh or torus. In
particular, the bound holds for the n-dimensional hypercube with N = 2n nodes.
Proof. The proof is a direct generalization of the proof for the two-dimensional case. Namely,
when a packet p is deflected we trace a deflection path forward in time and dimensions. This
path is a shortest path, and p is deflected away from its other end.
The fully loaded case: Consider the first step for a given packet p. Suppose that the first
dimension which must be corrected is i. Packet p will attempt to enter the highest possible
dimension ` with ` ≤ i. If p can enter such a dimension then it proceeds as in the lightly loaded
case. Otherwise, p enters the lowest dimension h, with h > i. Thereafter p will attempt to enter
a dimension less than or equal to i at which point it proceeds as in the lightly loaded case. While
doing so the dimensions upon which p is traveling are monotonically non increasing. That is,
p can be “forced” onto a lower dimension but never a higher dimension. (It has priority over
packets wishing to enter h from a higher dimension, but not over packets already executing
the lightly loaded case and wishing to enter h from a lower dimension.) Suppose that p is in
9
dimension h while attempting to enter the lightly loaded case. If p’s destination in the first
h − 1 dimensions is x1 , . . . , xh−1 , then during the time period while p is in dimension h and is
still attempting to enter the “lightly loaded case,” it sets its “target” to be any node consistent
with these coordinate values so that there is again a well defined notion of a deflection.
Theorem 6 The above algorithm guarantees delivery time of dist (p) + 4(k − 1), for every
packet p in any fully loaded dynamic routing problem on a d-dimensional mesh or torus. In
particular, the bound holds for the n-dimensional hypercube with N = 2n nodes.
Proof. Again, this is a direct generalization of the two-dimensional case. In particular, we
must account for the deflections until a packet enters the lightly loaded stage. Any deflection
of a packet p in this preliminary stage will be said to be deflected by a packet q traveling in the
correct direction along dimension i, where i is the first dimension that must be corrected. Then,
as in the two-dimensional case we will look at the deflection sequence and remove subsequences
until there are no repetitions in the sequence. This can be done since once again the algorithm
proceeds in such a manner that if packet p is twice deflected by packet q it must be the case
that they intersect in the same place and hence again there is an 1-1 correspondence between
the deflections of p and those of q between the two times q deflected p.
We show how to implement the algorithm for the lightly loaded case in two ways: (i) as a
two-pass of links algorithm and (ii) as a one-pass of packets algorithm. Then, we show how
to implement the algorithm for the fully-loaded case in two ways: (i) as a three-pass of links
algorithm, and (ii) as a one-pass of packets algorithm.
The lightly loaded case: We first show the implementation as a two-pass of links algorithm.
Consider a packet p entering a node v using the incoming edge in dimension i and direction
φ (for 1 ≤ i ≤ d, and φ ∈ {+, −}). In the algorithm for the lightly loaded case there are two
possibilities: (1) if packet p wishes to go out in the same dimension and the same direction,
then it always succeeds in doing so. (2) if packet p wishes to go out on an outgoing link of
dimension h > i it succeeds only if no packet of higher dimension wishes to use the same
outgoing edge. Each of these two possibilities is implemented in a separate pass.
For each node v, in the first pass we scan its incoming edges in any fixed order and assign
the outgoing edges for all packets that wish to continue in their incoming direction; that is,
if a packet p enters v in the φ direction of dimension i and wishes to continue in the same
dimension and the same direction, then the outgoing edge of v in the φ direction of dimension
i is assigned to p. In the second pass we scan the incoming edges in a fixed order from
dimension d to dimension 1 (where in each dimension we first scan the incoming edge in the
“+” direction and then the incoming edge in the “−” direction), and assign to each packet p
its desired outgoing link unless this edge has been assigned already to a previous packet. In
this case we assign to p an outgoing edge of the highest dimension amongst all the available
outgoing edges whose dimension is not higher than the dimension of the desired link of p. It
is easy to see that this scheme indeed implements the algorithm for the lightly loaded case.
10
This two-pass of links algorithm can be implemented as a one-pass of packets algorithm.
The order of the packets would be first the packets that wish to continue in their incoming
direction, and then the rest of the packets ordered according to the dimension of their incoming
edges.
The fully loaded case: We implement the algorithm for the fully loaded case as a threepass of links algorithm as follows. In the first two passes we consider only the packets currently
following the algorithm of the lightly loaded stage and repeat the algorithm described above.
In the third pass we scan the incoming edges in any order and assign the outgoing links for
the rest of the packets (i.e., those that try to enter the lightly loaded stage). Each such packet
enters the lightly loaded stage if it can do so using one of the available outgoing links.
This three-pass of links algorithm can also be implemented as a one-pass of packets algorithm. The order of the packets would be first the packets currently following the algorithm
of the lightly loaded stage in the same order as above and then the rest of the packets.
Some aspects of these algorithms are not at all essential for establishing the stated bounds.
For example, while a packet is attempting to enter the “lightly loaded stage,” it does not matter
if it sets its destination row as its “target.” For the higher dimensional meshes we can keep a
packet in row i while waiting to enter row j even if there is some row h with i ≤ h ≤ j which is
available. We have presented the algorithms so that they would be both simple to implement
as well as hopefully working well “in practice.” In particular, we suspect that these algorithms
can be shown to have provably good (say, O(n)) performance for routing permutations.
4
A better algorithm for the multi-dimensional mesh
In this section we give another algorithm for the d-dimensional mesh. This algorithm will route
any k packets so that the delivery time of each packet p is bounded by dist (p)+2(k−1). Note
that a hypercube is a log n-dimensional mesh and thus our algorithm works for the hypercube.
This algorithm when applied to the two-dimensional case can be slightly modified so it will
still maintain the O(n1.5 ) bound for routing a permutation, as the algorithm of [2]. For two
dimensions this algorithm works also on tori. However, it does not seem to extend to work on
any torus with at least three dimensions.
Our algorithm is motivated by the general charging scheme for analyzing deflection routing
algorithms given in Section 2. Consider a packet p that is deflected at time t from node v to
node u. W.l.o.g. assume that the coordinates of v are (x1 , . . . , xr , . . . , xd ) and the coordinates
of u are (x1 , . . . , xr − 1, . . . , xd ).
We claim that in order for the condition of Claim 1 to hold it suffices to prove the existence
of a deflection sequence p1 , p2 , . . . , p` and a corresponding deflection path with the following
properties:
11
1. Let v` be the destination of p` (the last packet in the sequence). The r-th coordinate of
p` is at least xr .
2. The deflection path from v to v` is a shortest path between these two nodes. Moreover,
in this shortest path the coordinates of v are corrected in a cyclical order (if a correction
is necessary) starting from coordinate r, moving to coordinate r + 1, and so forth up to
the coordinate r − 1 to obtain v` .
To see this note that if the above two properties hold then the shortest path from u to v`
is strictly longer than the deflection path from v to v` .
So, the problem at hand is how to define for every deflection the deflection sequence (and
in particular the last packet in the sequence) that would satisfy these two properties. For this
we need the following definitions.
Consider a packet p at an intermediate node v in time t. The good directions of p are all
the directions along which p can advance from v towards its destination. Each such direction
is denoted by a pair from {1, . . . , d} × {+, −}, where the first entry denotes the coordinate and
the second entry denotes the good direction along the coordinate. Define the interval of p at
v with respect to a good direction (r, φ) (where φ ∈ {+, −}) to be the maximal cyclic (closed)
interval that ends at r − 1 and contains only coordinates that are already fixed; that is, if this
interval is [c, r − 1], then for all c ≤ j ≤ r − 1 (if c ≤ r − 1), or for all 1 ≤ j ≤ r − 1 and
c ≤ j ≤ d (if c > r), the current coordinate of p is also its destination coordinate and this is
not the case for c − 1. (The interval may be empty.)
At each time step of the algorithm each packet p that has not reached its destination yet is
associated a desired direction. Intuitively, this direction is the good direction along which the
packet p currently desires to advance. Initially, we make the desired direction of each packet
to be the good direction along the minimum coordinate that it has to correct. Suppose that
at time t > 0 packet p is at an intermediate node v and it arrived there along coordinate r.
Then its desired direction at t is its good direction (r0 , φ), where r0 is the closest coordinate
greater or equal to r (again, we assume cyclical order) that is not fixed yet.
Let us now define the algorithm. At time t there are at most 2d active packets at a node
v = (x1 , . . . , xd ) and the algorithm has to assign to each of these packets an outgoing direction.
The assignment of the outgoing directions will obey the following two rules.
Rule 1. Suppose that (r, φ) is the desired direction of some packet p and that the packet q
is assigned to this direction. Then, the interval of q with respect to (r, φ) is at least as
long as the interval of p with respect to this direction.
Rule 2. Suppose that packet p is deflected along direction (r, φ). Let r0 be the closest coordinate greater or equal to r (cyclically) such that (r0 , φ0 ) is a good direction for p at v. Let
q be the packet assigned to this direction. Then, the interval of q with respect to (r0 , φ0 )
is at least as long as the interval of p with respect to this direction.
12
First, we give an algorithm that implements rules 1 and 2. Then, we prove that any
algorithm that obeys these rules achieves delivery time of dist (p) + 2(k − 1) for each packet
p.
Implementing rules 1 and 2 is done as follows. Each packet will have a list of good directions,
beginning with the desired one. We do the assignment of packets to links in rounds. In each
round, all packets compete. Each packet competes for the first possibility on its list. The
priority of each packet is determined by the size of its interval with respect to this possibility,
ties broken arbitrarily. Losing packets remove the head of their list. Winning packets keep on
competing on the same directions in future rounds. A loser in one round may (by changing
its list head and therefore its desired direction) cause a winner in the same round to become a
loser in a subsequent round. The process ends when all losing packets have empty lists. Then,
the final winners move on their winning direction and the rest of the packets are deflected
arbitrarily. Since every round where someone loses reduces the sum of lengths of all lists, this
is guaranteed to terminate.
Fix a coordinate and a direction. Note that the sequence of interval sizes associated with the
packets that won in this coordinate/direction in the rounds of the competition is monotonically
non-decreasing. Therefore, Rule 1 is implemented. Now, consider a deflected packet p. It lost
the competition in all of its good directions. Hence, the interval of the current winner in each
of these directions is no smaller than the interval of p w.r.t. this direction. This easily implies
the implementation of Rule 2.
We give an alternative implementation that may be easier to implement. We first assign
packets to all directions that are desired directions of at least one packet. Let (ri , φi ), for
i = 1, . . . , s, be the desired directions. Define the primary candidate of a desired direction
(ri , φi ) to be a packet whose desired direction is (ri , φi ) and whose interval size with respect
to this direction is maximum. Define the desired interval size of such a direction (ri , φi ) to be
the interval size of its primary candidate with respect to this direction.
The order and the assignment of the packets is determined by the following procedure.
Given a prefix of the assignment of packets, the procedure is invoked to determine the next
packet in the list and its assignment.
Step 1: Test if there is a direction (ri , φi ) such that there exists an unassigned packet that is
not a primary candidate of any direction whose interval size with respect to (ri , φi ) is more
than the size of its desired interval.
Step 2: If such a direction (ri , φi ) exists, let p be the unassigned packet that is not a primary
candidate of (ri , φi ) and whose interval with respect to this direction is the longest. Add p to
the ordered list and assign it to (ri , φi ).
Step 3: Otherwise, if there is an unassigned desired direction, add its primary candidate to
the ordered list, and assign it to this direction.
Step 4: Otherwise (i.e., if all the desired directions are assigned packets), pick the unassigned
packet with the longest interval with respect to an unassigned direction. Add this packet to
13
the ordered list, and assign it to this direction.
Steps 2 and 3 guarantee that if (r, φ) is the desired direction of some packet p, then the
interval of the packet assigned to (r, φ) with respect to this direction must be at least as long
as the respective interval of p. Thus, they implement Rule 1. Step 4 guarantees that if a packet
p is assigned to some direction, then the interval of any deflected packet with respect to this
direction is no longer than the respective interval of p. This implies Rule 2.
Finally, consider the application of the algorithm to the two-dimensional case. This algorithm can be modified slightly so that it still maintains the O(n1.5 ) bound for permutation
routing. This modification is given by adding the following rule: when several packets are
competing on the same good direction and none of them has priority according to Rules 1 and
2 give priority to the packet that is already traveling in the same direction (in case it is one of
the competing packets).
Now, we prove that any algorithm that obeys Rules 1 and 2 achieves delivery time of
dist (p)+2(k−1) for each packet p. For this we show how to construct the deflection sequences
and deflection paths that satisfy the two properties listed above. Consider a deflection path
p1 , . . . , p` . We associate the time series t0 ≤ t1 ≤ · · · ≤ t` with this sequence. This time series
has the following meaning: in the corresponding deflection path we follow the path taken by
packet pi from time ti−1 to time ti , for 1 ≤ i ≤ `. We say that pi is the active packet of the
sequence at time t, for all ti−1 < t ≤ ti . The deflection sequences are constructed dynamically.
That is, at time step t all the deflection sequences are defined only up to the active packet at
time t. (Note that if the last packet in a deflection sequence has arrived at its destination by
time t, then this sequence is fully-defined at time t.)
At time t we have to augment two types of deflection sequences: one that corresponds to
deflections at time t, and one that corresponds to past deflections. First, consider a deflection
sequence corresponding to a present deflection. We have to define the first active packet in
this sequence. W.l.o.g. suppose that packet p is deflected from node v = (x1 , . . . , xr , . . . , xd )
to node u = (x1 , . . . , xr − 1, . . . , xd ) (along direction (r, −)). Let c be the closest coordinate
greater or equal to r (cyclically) such that (c, φ0 ) is a good direction for p at v. The first active
packet p1 in the corresponding deflection sequence is the packet assigned to direction (c, φ0 ).
W.l.o.g. assume that φ0 is + direction; i.e., p1 advances to (x1 , . . . , xc + 1, . . . , xd ).
Lemma 7 Destination coordinate c of both p and p1 is at least xc +1. If c > r, then destination
coordinates r through c − 1 of both p and p1 are xr , ..., xc−1 .
Proof. The first claim follows since (c, +) is a good direction for p, and since p1 advances
towards its destination coordinate c when it is assigned to direction (c, +). The second claim
follows from Rule 2, since if c > r, the interval of p with respect to c contains [r, c − 1], and
thus the interval of p1 with respect to c must contain [r, c − 1] as well.
Now, consider a deflection sequence corresponding to a past deflection. Suppose that pk is
the active packet in this sequence at time t. Let the desired direction of pk at time t be (r, φ).
The next active packet pk+1 in this sequence is the packet that is assigned the direction (r, φ).
14
(It may continue to be pk .) We set the desired direction of pk+1 at time t + 1 to be (c, φ0 ),
where c be the closest coordinate greater or equal to r (cyclically) that is not fixed yet.
Lemma 8 The interval of pk+1 with respect to its new desired direction at time t + 1 contains
the interval of pk at time t with respect to its desired direction at this time.
Proof. Let pk+1 ’s interval with respect to (r, φ) at time t be [c0 , r − 1] (if the interval is empty,
set c0 = r). Then, pk+1 ’s interval with respect to (c, φ0 ) at time t + 1 is [c0 , c − 1], since by
definition of c, all the coordinates between r and c − 1 are correct. So, the lemma clearly holds
if pk+1 = pk . If pk+1 6= pk , then, by Rule 1, since pk+1 is assigned to the desired direction of
pk , the interval of pk with respect to its desired direction (r, φ) must be contained in [c0 , c − 1],
and the lemma holds in this case as well.
Lemma 9 If r = c then φ0 = φ.
Proof. W.l.o.g. assume φ = +. Let the rth coordinate of pk+1 (as well as pk ) at time t be xr .
By the first claim in Lemma 7, the rth destination coordinate of pk+1 is at least xr + 1. Since
pk+1 moves to xr + 1 and r remains its desired direction, then the value of this destination
coordinate is at least xr + 2, which means that φ0 = +.
Consider a deflection sequence p1 , . . . , p` and the corresponding time series t1 , . . . , t` associated with a deflection of packet p from node v = (x1 , . . . , xr , . . . , xd ) to node u = (x1 , . . . , xr −
1, . . . , xd ). Using the above three lemmata, we show the following claims. (The analogous
claims for a deflection to (x1 , . . . , xr + 1, . . . , xd ) can be deduced similarly.)
Claim 10 Let t1 ≤ t ≤ t` . Consider the path given by following the active packets in the
sequence up to time t. The r-th coordinate of every node in this path is at least xr .
Proof. As long as the deflection path proceeds along the rth coordinate, Lemmata 7 and 9
insure that the value of the rth coordinate is monotonically increasing. As soon as the deflection
path uses another coordinate, Lemmata 7 and 8 insure that r is contained in the interval of
the current packet in the deflection sequence with respect to its desired destination. Therefore,
the value of the rth coordinate does not change along the rest of the path.
Claim 11 For all t1 ≤ t ≤ t` , the path given by following the active packets in the sequence
up to time t is a shortest path in which the coordinates are corrected in cyclical order.
Proof. By Lemmata 7 and 8, the sequence of coordinates that the deflection path traverses
is in cyclical order, i.e., it is bitonic, with one monotone segment starting at c ≥ r, followed by
another monotone segment starting at c0 ≥ 1 and ending at c00 < r. Lemmata 7 and 9 imply
that in each portion of the path traversing a single coordinate, the value of that coordinate
changes monotonically.
We conclude from these claims that following Rules 1 and 2 implies the conditions required
in Claim 1, which in turn implies the claimed delivery time bounds.
15
5
Conclusions and open problems
The many-to-many routing problem studied here is symptomatic of the difficulties in analyzing
hot potato algorithms. One intuitively believes that the “desired” dist (p) + 2(k − 1) bound
(being as weak as it is for problems such as permutation routing) should hold for all networks
via a “simple” and “practically appealing” algorithm. At the very least one suspects that for
each network there should be an algorithm achieving the desired bound and the analysis should
be reasonably direct using some easily constructed deflection path argument. But perhaps our
intuition is wrong!
Obviously many open problems remain for the subject of hot potato routing. Here we just
mention some of the problems most directly related to the results of this paper.
1. Is there an undirected network for which no deterministic hot potato algorithm can route
every many-to-many instance so that packet p arrives within dist (p) + 2(k − 1) steps?
Is it possible to establish an O(dist (p) + k) bound for all networks?
2. What is the worst case many-to-many bound for Hajek’s “shortest to go” priority methods when applied to the two-dimensional and higher dimensional meshes? Does every
such scheme achieve the “desired” bound? Does any such scheme achieve the desired
bound?
3. Can the bound given for the dimension by dimension algorithm be improved to the
desired bound in the fully loaded case?
4. For any of the algorithms given in this paper what is the best bound for permutation
routing?
5. Can one show that the “one-pass of links” property is a substantial restriction? That
is, can lower bounds be proven in this model that do not hold in general for hot potato
algorithms?
6. Is there an algorithm with a bounded number of packet/link–passes that achieves the
dist (p) + 2(k − 1) bound for meshes and tori?
References
[1] A.S. Acampora and S.I.A. Shah. Multihop lightwave networks: a comparison of store-andforward and hot-potato routing. In Proc. IEEE INFOCOM, pages 10–19. IEEE Computer
Society Press, 1991.
[2] A. Bar-Noy, P. Raghavan, B. Schieber, and H. Tamaki. Fast deflection routing for packets
and worms. In Proc. 12th Symp. on Principles of Distributed Computing, pages 75–86,
August 1993.
16
[3] P. Baran. On distributed communications networks. IEEE Transactions on Communications, pages 1–9, 1964.
[4] I. Ben-Aroya, T. Eilam, and A. Schuster. Greedy hot-potato routing on the two dimensional mesh, to appear in Distributed computing, 9(1), 1995.
[5] A. Ben-Dor, S. Halevi, and A. Schuster. Potential function analysis of greedy hot potato
routing. In Proc. 13th Symp. on Principles of Distributed Computing, August 1994. to
appear.
[6] J.T. Brassil and R.L. Cruz. Bounds on maximum delay in networks with deflection routing.
In Proc. 29th Allerton Conf. on Communication, Control and Computing, pages 571–580,
1991.
[7] U. Feige. Observations on hot potato routing. In Proc. 3rd Israel Symp. on the Theory of
Computing and Systems, pages 30–39, 1995.
[8] U. Feige and P. Raghavan. Exact analysis of Hot Potato routing. In Proc. 33rd Symp. on
Foundations of Computer Science, pages 553–562, October 1992.
[9] A.G. Greenberg and J. Goodman. Sharp approximate models of deflection routing in
mesh networks. IEEE Transactions on Communications, 41(1):210–223, January 1993.
[10] A.G. Greenberg and B. Hajek. Deflection routing in hypercube networks. IEEE Transactions on Communications, 40(6):1070–1081, June 1992.
[11] B. Hajek. Bounds on evacuation time for deflection routing. Distributed Computing, 5:1–6,
1991.
[12] C. Kaklamanis, D. Krizanc, and S. Rao. Hot potato routing on processor arrays. In Proc.
5th Symp. on Parallel Algorithms and Architectures, pages 273–282, 1993.
[13] D.H. Lawrie and D.A. Padua. Analysis of message switching with shuffle-exchanges in
multiprocessors. In Interconnection Networks. IEEE Computer Society Press, 1984.
[14] Y. Mansour and B. Patt-Shamir. Greedy packet scheduling on shortest paths. In Proc.
10th Symp. on Principles of Distributed Computing, pages 165–175, August 1991.
[15] N.F. Maxemchuk. Comparison of deflection and store and forward techniques in the
Manhattan street and shuffle exchange networks. In Proc. IEEE INFOCOM, pages 800–
809. IEEE Computer Society Press, 1989.
[16] R. Prager. An algorithm for routing in hypercube networks. Master’s thesis, University
of Toronto, September 1986.
[17] B. Smith. Architecture and applications of the HEP multiprocessor computer system. In
Proc. 4th Symp. on Real Time Signal Processing, pages 241–248. SPIE, 1981.
17
[18] A. Symvonis. A Note on Deflection Routing on Undirected Graphs. University of Sydney,
Department of Computer Science TR 493, November 1994.
[19] A. Symvonis. Private Communications, November 1994.
[20] T. Szymanski. An analysis of “Hot Potato” routing in a fiber optic packet switched
hypercube. In Proc. IEEE INFOCOM, pages 918–925. IEEE Computer Society Press,
1990.
[21] Z. Zhang and A.S. Acampora. Performance analysis of multihop lightwave networks with
hot potato routing and distance age priorities. In Proc. IEEE INFOCOM, pages 1012–
1021. IEEE Computer Society Press, 1991.
18
View publication stats