Magnet: Practical Subscription Clustering for
Internet-Scale Publish/Subscribe
Sarunas Girdzijauskas
Swedish Institute of Computer Science (SICS), Stockholm, Sweden
sarunas@sics.se
Gregory Chockler, Ymir Vigfusson, Yoav Tock, Roie Melamed
IBM Haifa Research Laboratory, Israel
{chockler,ymirv,tock,roiem}@il.ibm.com
ABSTRACT
An effective means for building Internet-scale distributed
applications, and in particular those involving group-based
information sharing, is to deploy peer-to-peer overlay networks. The key pre-requisite for supporting these types of
applications on top of the overlays is efficient distribution of
messages to multiple subscribers dispersed across numerous
multicast groups.
In this paper, we introduce Magnet: a peer-to-peer publish/subscribe system which achieves efficient message distribution by dynamically organizing peers with similar subscriptions into dissemination structures which preserve locality in the subscription space. Magnet is able to significantly
reduce the message propagation costs by taking advantage of
subscription correlations present in many large-scale groupbased applications.
We evaluate Magnet by comparing its performance against
a strawman pub/sub system which does not cluster similar
subscriptions by simulation. We find that Magnet outperforms the strawman by a substantial margin on clustered subscription workloads produced using both generative models and real application traces.
1. INTRODUCTION
The Internet of tomorrow must be poised to support applications that involve large collections of users engaged in
group-based interactions and information sharing, including Internet TV (IPTV) [7, 23], collaborative editing [32],
and massive multi-player games [18, 30]. These applications
require a group communication substrate capable of dealing with immense numbers of users and multicast groups
in a scalable fashion. DHT-based peer-to-peer substrates
offer almost unlimited growth capacity and efficient routing functionality while incurring only a modest maintenance
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
DEBS ’10, July 12-15, 2010, Cambridge, UK.
Copyright 2010 ACM 978-1-60558-927-5/10/07 ...$10.00.
172
overhead at each participant [26, 24]. They are an attractive design choice to serve as a basis for a scalable multicast solution. However, for reasons of connectivity and load
balancing, most existing DHTs support name-independent
routing topologies in which the node placement is entirely
determined by a uniform hash of its name, and hence independent of its geographical location, interest preferences,
and other node-specific attributes.
To provide efficient overlay-based multicast routing, a prerequisite is that peers who share the same (or similar) interests are well-clustered, i.e., separated from each other by
a small number of peers with different interests. Exploiting well-clustered interests may be accomplished by using
the techniques underlying locality-aware DHTs [24, 35, 1]
or metric embeddings [33]. These approaches, however, rely
on various assumptions about the distribution of node subscriptions, and are insufficient for supporting a general purpose multicast system wherein the participant subscriptions
are a priori unknown and may change over time.
In this paper, we introduce Magnet, an efficient peer-topeer multicast system that supports the publish/subscribe
(pub/sub) API and exploits well-clustered topic interests.
Magnet requires an underlying DHT which allows node
specific attributes (and their ordering) to be directly incorporated into the routing structure [5, 15, 14]. We used the
Oscar DHT [12, 13, 14] to dynamically cluster the nodes in
the Magnet overlay based on their subscription preferences.
Our choice of Oscar was motivated by its ability to construct topologies which are both provably small-world [16,
11], and have a low maintenance overhead. Nonetheless, we
believe that our techniques are general enough to also produce good results on top of the other name-dependent DHT
substrates, such as Mercury [5] and GosSkip [15].
At the core of Magnet is a clustering algorithm that takes
as an input the subscription of a node, and outputs the node
location (or equivalently, the node identifier) on a logical
ring, which is a part of the underlying Oscar DHT. The
goal is to ensure that the identifier values reflect similarity
in the subscription space, that is, the nodes with similar
subscriptions are assigned numerically closer identifiers than
those with dissimilar ones. Specifically, we define a similarity
metric sim over the set of all possible subscriptions S (that
is, S contains all subsets of T , the set of all topics) such
that for any two subscriptions s1 and s2 in S, sim(s1 , s2 ) =
|s1 ∩ s2 |/|s1 ∪ s2 |.
Dynamic clustering. Note that since the subscription
space can be arbitrarily large, and the input distribution of
the node preferences is a priori unknown, the mapping from
subscriptions to identifiers cannot be fixed in advance, but
should instead be computed dynamically based on the preferences of the nodes already in the overlay. In Magnet,
this is accomplished through a distributed membership service, which, for each topic t, maintains a random sample
of the current subscribers to t along with their interests.
Subsequently, a peer p who subscribes to t will first query
this membership service to determine which of t’s current
subscribers (as known to the membership service) has most
similar interests to p using the distance metric defined above;
p will then join the ring next to that subscriber. Our experiments showed that effective clustering is possible even if the
size of the subscribers’ sample maintained by the membership service is very small, thus can be maintained in a lazy
fashion using a low-bandwidth background gossip protocol
which has low impact on the overall system throughput.
Routing topology. Once a peer joins the logical ring, it
is connected into a small-world routing topology maintained
by the underlying Oscar overlay. The set of the peer’s outgoing connections is augmented with a few additional longrange pointers (or fingers) chosen so that the probability of
connecting to a node is inversely proportional to the ringhop distance to the node1 . To estimate the locations of the
long-range neighbors, Oscar maintains a digest of the identifier distribution on the ring. This digest is maintained by
periodically sampling the node population using a series of
random walks. Note that the overhead of maintaining this
digest is small since, as it was shown in [14], a logarithmic
number of random walkers would suffice to reliably estimate
the identifier distribution.
Message dissemination. In the final step of our construction, the underlying small-world routing structure is
leveraged to create locality preserving distribution trees. As
in [6], Magnet maintains a home location for each topic determined by uniformly hashing the topic name. The home
location for topic t serves as the root of the multicast tree
used to distribute the messages posted on t (and also as a
root of the spanning tree used to maintain the samples of the
t’s subscriber interests; see above). Unlike [6], the trees in
Magnet are created in a top-down fashion so that the paths
from the root to each of the subscribers coincide with the
point-to-point greedy routing paths from the root to those
subscribers in the overlay. The actual tree construction algorithm does not necessitate contacting the topic’s root on
each subscribe request. Instead, each new subscriber joins
the tree by following the routing path towards the topic’s
home location until a grafting point lying on the top-down
routing path from the root to that node is found (or the
topic’s home location is reached). We will argue that the
routing trees constructed in this way preserve locality in
the overlay, and therefore, maintain the desired subscription
clustering.
Although the techniques behind Magnet were devised for
topic-based pub/sub, they may be generalized to content1
Strictly speaking, the probability of creating a link from
node u to node v is inversely proportional to the integral
of the probability density function of peer identifier distribution between the identifiers of u and v in the identifier
space. For more detailed analysis please refer to our prior
work [11].
173
based pub/sub by extending the notion of similarity to a
multi-dimensional attribute space. The method, however, is
out of the scope of this paper.
Results on synthetic models. The improvement in
propagation costs achieved by the Magnet’s clustering depends on the degree of similarity in the input node subscriptions. As we show in Section 4, the cost savings are most significant in subscription workloads that exhibit well-defined
structural dependency among the individual subscriber interests. For example, subscriptions to IPTV channels have
been shown to embody substantial correlation between users
[23], which is intuitive considering that news and other content on local channels are of primary interest to users located
within that geographical or administrative region.
Following the approach of Wong et al. [34], we generate
structurally correlated workloads by grouping the topics into
several categories (or modes) in either one or two dimensions, and then select subscriptions from one or several of
those categories either deterministically or by using a powerlaw popularity distribution. Our findings show that on these
workloads, Magnet saves between 20% to 80% of message
propagation costs to uninterested relays over a strawman
peer-to-peer pub/sub implementation which does not cluster subscriptions.
We present a new Hierarchical-Topics model to synthetically generate subscriptions that follow a hierarchical
classification scheme. The idea is to first assign users with a
home topic, then repeatedly pick some home topic with preference for higher popularity and select another topic with
preference for “similar” topics according to a binary classification hierarchy. We then make the subscriber of the first
topic join the second one. We find that Magnet saves between 20% and 60% of the costs incurred by the strawman
under this model.
Results on real-world subscription patterns. We
also evaluate our system on subscription patterns that arise
in large-scale collaborative applications. In particular, we
used a trace of all edits of Wikipedia articles by registered
users over a 6 year period to both directly evaluate Magnet
and to generate a model of real-world subscription patterns.
Our experiments determined that the strawman implementation involved 77% more uninterested peers in message distribution than Magnet.
In addition, all our experiments indicate that the Magnet
performance is adaptive to the degree of correlation in the
input subscription, and is, in particular, never worse than
that exhibited by the strawman.
2. PRELIMINARIES
We with definitions and notation that will be used throughout the paper. We then briefly describe Oscar, our underlying DHT substrate, focusing on the properties that are
relevant in context of Magnet.
2.1
Definitions and Notation
We let T = {t1 , . . . , tm } denote the set of all topics. We
define a similarity metric, sim, to be the function mapping
a pair of node subscriptions s1 , s2 ⊆ T to the normalized
size of their intersection, with the range [0, 1]. Formally,
sim(s1 , s2 ) =
|s1 ∩ s2 |
.
|s1 ∪ s2 |
In some of our experiments, we will also consider a similar-
ity metric weighted by the topic transmission rate. Specifically, for λi being the transmission rate of topic ti , sim(s1 , s2 )
is defined to be
∑
λi
sim(s1 , s2 ) =
i:ti ∈s1 ∩s2
∑
λi
.
Samples gathered by random walkers
a)
Pu
Median peer of the 1st sample set
b)
i:ti ∈s1 ∪s2
Identifiers. Each Magnet node p is connected into two
independent ring-based DHT structures: one, called the control DHT, for supporting the interest-based membership service and the topic’s home location, and the other one, called
the skewed DHT for clustering peers according to their interests. We will also use the terms “control” and “skewed” to
refer to the underlying ring structures maintained by those
DHTs. Consequently, p is assigned two identifiers, denoted
id(p)c and id(p)s , one for the control and the other one
for the skewed DHTs respectively. The routing table of p,
RT (p), is the union of the control and skewed DHT routing
tables RT (p)c and RT (p)s . We write succ(p)c and succ(p)s
to denote the p’s successor on the control and skewed rings
respectively. The set of topics p is subscribed to is referred
to as the p’s subscription (or interest), and denoted p.sub.
The skewed DHT connectivity is maintained by the Oscar protocol described below; and the control DHT can
be supported by either Oscar itself or any of the existing
name-independent DHTs, such as Chord and Pastry [26, 24].
2.2 The Underlying Small-world DHT
The nodes in Oscar are organized into a logical ring structure augmented with additional long-range pointers, or fingers. As discussed in Section 1, the fingers are created based
on the actual distribution of the node identifiers in the input,
which can be arbitrarily skewed. To this end, Oscar performs periodic sampling of the node population in order to
estimate the current distribution, and re-wires the network
accordingly.
Specifically, each Oscar peer Pu executes the following
protocol (see Figure 1).
a) First, we simultaneously start a constant number of
random walks (5 in Figure 1) to sample the node population. The median of the sample set p1 is then used
to estimate the median for the entire population from
Pu ’s perspective.
b) Pu then proceeds in the same fashion by sampling the
sub-population occupying the range (Pu , p1 ) to estimate its median p2 .
c) Next, the range (Pu , p2 ) is sampled to estimate the
median p3 , and so on. Continuing in this fashion,
Pu will eventually learn the approximate locations of
k = O(log n) medians p1 , p2 , . . . , pk , which define k
partitions X1 , X2 , . . . , Xk of exponentially decreasing
size.
d) Pu then selects between 1 and k fingers so that the ith
finger, 1 ≤ i ≤ k, is selected by first choosing one of
the partitions Xj , and then picking a peer within this
partition.
Both selections are done uniformly at random. As we show
elsewhere (see [14]), this protocol produces a small-world
topology, which implies that each node is reachable from
any other one in at most O(log2 (n)/k) hops by following a
greedy routing procedure on the node identifiers.
174
Samples gathered by random walkers
on a subset of the peer population
Partition representing ½
of the population
Pu
Median peer of the 2nd sample set
c)
Pu
¼ of the population
Medians of all the sample sets representing logarithmic partitions of the key space
Pu chooses one partition u.a.r and u.a.r. a peer within the partition.
d)
Pu
Figure 1: The Finger Selection Protocol in Oscar.
Number of long-range links. The number of the fingers selected by a node is a parameter of the protocol, and
can vary from node to node (but must be at least 1 at each
node to maintain the small-world properties). Moreover,
since the long-range neighbors are selected from a range of
possibilities, there is an additional flexibility to incorporate
other criteria into the selection process, such as e.g., the
“power of two choices” [19]. In Magnet, we utilize this
property to bias the long-range neighbor selection towards
the nodes with closer interests. Also, as we show in Section 4, the higher node degree is instrumental in improving
connectivity among the subscribers who are interested in the
same topics, and yet due to the imperfection of the clustering algorithm ended up residing in disjoint ring regions.
3. IMPLEMENTATION
The crux of Magnet’s implementation is the node join
protocol which is executed every time a node subscribes to
a new topic, or drops one of the existing topics from its subscription (provided, this is not the last topic it is subscribed
to). The join protocol consists of three main steps:
a) First, the node acquires an identifier on the skewed
DHT (Oscar) based on its subscription, and joins the
skewed ring based on that identifier.
b) The node then connects to additional long-range neighbors as prescribed by the underlying Oscar DHT (see
Section 2).
c) Finally, the node joins the distribution tree for each
topic to which it subscribes.
The core mechanisms in the Magnet implementation are
steps (a) and (c) which we describe in details in Sections 3.1–
3.3.
Both of the identifier acquisition and distribution tree protocols rely on the new interest-aware membership service
which is responsible for maintaining (possibly partial) views
of the interests of the nodes in the system. The interestaware membership is a core part of Magnet, and its implementation is described in Section 3.4. For the description
of the clustering and tree construction protocols (which are
presented first), we assume that each node p maintains a
local state variable view(p), populated by the membership
service, which maintains the current mapping from the set
of the node identifiers on the skewed ring IDs = {id(q)s } to
the set node subscriptions.
Magnet’s membership service implementation guarantees that for each topic t in the p’s subscription, view(p)
includes the interest of at least one other node q which is
also interested in t (unless p is the first to subscribe to t).
This property is instrumental for improving both the clustering quality (since each node is guaranteed to see at least
one node with a common subscription), and the performance
of the tree join protocol (since the node can join t’s tree
through another node already in t’s tree, instead of always
going through the root). The details of the distributed maintenance protocol for view(p) are given in Section 3.4.
3.1 Topic Home Locations
As in Scribe [6], each topic t’s is associated with a home
location, home(t), which is determined by uniformly hashing the t’s name, and looked up using a ring-based control
DHT. The t’s home location serves as the root of two spanning trees: one built over the skewed overlay, and used for
disseminating the messages posted on t (see Section 3.3);
and the other one built over the control overlay and used
for maintaining partial subscription views of the nodes interested in t (see Section 3.4).
As we mentioned in Section 2, any of the popular ringbased DHT implementations (such as e.g., [26, 24, 35]) can
serve as the Magnet’s control DHT, provided that it guarantees logarithmic routing latency under the assumption of
the uniformly distributed node and object identifiers. Accordingly, the implementation details of the control DHT
are omitted in the remainder of this paper.
that of a pub/sub system built on a name-independent DHT.
Consequently, the rejoin mechanism may be used sparingly
in practice, for instance by requiring a minimum number of
interest changes between rejoins.
The algorithm starts by inspecting view(p) to discover a
node q such that q = arg max{sim(p, q ′ ) : q ′ ∈ view(p)},
breaking ties randomly. Node p then joins the ring between
q and the q’s ring successor. If p fails to make contact with q
(e.g., due to a failure), then q is excluded from view(p) and
the entire join algorithm is re-executed. If view(p) is empty
at the time p joined,then p will join the ring at a location
determined by uniformly hashing its identifier.
Whenever view(p) changes, p may attempt to improve its
location by re-executing the join protocol. Note though that
this is not strictly necessary since the other nodes will take
into account the p’s present interest (and location) when
they join the ring, or change their subscriptions.
Algorithm 1 The identifier acquisition protocol for peer p:
id(p)s = getLocation(p, view(p))
1: if view(p) ̸= ⊘ then
2:
find q, such that sim(p.sub, q.sub) =
max{sim(p.sub, q ′ .sub) : (id(q ′ )s , q ′ .sub) ∈ view(p)}
3:
id(p)s := mean(id(q)s , id(succ(q)s )s )
4: else
5:
id(p)s := hash(p.name)
6: end if
7: return id(p)s
Observe that since the identifiers are chosen from a onedimensional space, it is impossible to guarantee that the
identifiers of the nodes with close subscriptions will always
be sufficiently close numerically to be well-clustered. However, as we show in our experiments, one-dimensional clustering turns out to work quite well for a wide range of realistic subscription patterns. Extending the Magnet techniques to better support multi-dimensional subscription correlation is the subject of future work.
3.2 Identifier Acquisition Protocol
3.3
The identifier acquisition protocol is depicted in Algorithm 1, and is the core part of the Magnet clustering implementation. It is executed whenever a node p first joins
Magnet, and every time it changes its interest (that is,
subscribes or unsubscribes to a topic2 ). Its goal is to ensure
that the nodes with the close subscriptions (as indicated
by the subscription similarity metric in Section 2) will be
assigned identifiers which are numerically as close to each
other as possible. Note that rejoining when interests change
is only necessary to maintain the clustering and does not affect correctness of our system. Our system is designed to be
flexible and adaptive, allowing each Magnet node to decide
locally on how often the change of its identifier is permitted depending on the node’s load, available resources, and
so forth. Thus, even when subscriptions are changing frequently, the nodes are free to remain stable and retain their
existing locations in the identifier space. The connectivity
of Magnet guarantees that the system remains fully operational and assures that the performance is never worse than
Once the node p’s identifier on the skewed ring is fixed,
and p is connected into the Oscar overlay, the node will
proceed to join the multicast tree for each topic to which it
subscribes. In the following, we describe the steps taken by
p to join the multicast tree T (t) for one such topic t.
The tree join protocol consists of the three main phases
(see Algorithm 2). At the first phase (lines 2.1–6), p consults view(p) to find another node ps interested in t which
already belongs to the t’s multicast tree. If no such node is
found, then the tree’s root, home(t), will be used in its stead.
During the next phase (2.7–11), p will traverse the t’s tree
upwards starting at ps , until reaching a node pg such that pg
is either home(t), or has a finger q in its skewed DHT routing table such that q is the next hop on the greedy routing
path from pg to p, and (pg , q) is already an edge of T (t). The
tree join protocol will then enter the final phase (2.12–14)
at which the greedy routing path towards p will be followed
until encountering a node r that has p in its finger table. At
this point, p will join T (t) as a child of r.
By induction over the node join events, it is easy to see
that for each subscriber p of t, the path from home(t) to p
in the resulting tree T (t) will coincide with a greedy routing path from home(t) to p on the skewed DHT. Since each
2
The worst case latency of rejoining the network upon subscription change is O(log N ), and its communication complexity is at most O(t log N ) where t is the node’s subscription size, and N is the total number of the nodes.
175
The Tree Join Protocol
Algorithm 2 Multicast tree join algorithm joinT ree(p, t)
; Phase 1:
1: S := {q : (q, q.sub) ∈ view(p) ∧ p ∈ T (t)}
2: if S ̸= ∅ then
3:
ps := q ∈ S, chosen uniformly at random
4: else
5:
ps := home(t)
6: end if
; Phase 2:
7: Traverse T (t) from ps upwards until reaching pg such
that:
8: (1) pg = home(t), ∨
9: (2) ∃id(q)s ∈ RT (pg )s such that:
10:
(a) id(q)s is the closest to p (from below) ∧
11:
(b) (pg , q) ∈ T (t).edges
; Phase 3:
12: Greedily route from pg to p until reaching r such that:
13: id(p)s ∈ RT (r)s
14: Join T (t) as a child of r
consecutive step of the greedy routing procedure exponentially decreases the distance to destination in the identifier
space, the nodes with close identifiers will also be close to
each other on the greedy routing path. We conclude that
T (t) preserves locality in the identifier space, and therefore
also in the subscription space (to the extent it is maintained
by the identifier acquisition protocol in Section 3.2).
Multicast tree maintenance. The tree structure is
maintained by having each node to periodically ping its parent in the tree using a heartbeat message. The node’s parent
is declared to be disconnected if it fails to respond to a preconfigured number of heartbeats. At this point, the node
issues a new joinT ree request which would reconnect the
node to the closest available branch of the tree in terms of
overlay hops. Such DHT-based tree maintenance is known
to be robust (see e.g., Scribe [6], Bayeux [36].) When the
network is stable, the messages are delivered to all their
subscribers deterministically; whenever a failure occurs, the
underlying small-world DHT is robust enough to allow the
involved nodes to recover quickly, restore the overlay connectivity and heal the affected trees by repeating the tree join
procedure and bypassing the failed node(s) by reconnecting
through alternative paths.
3.4 Interest-Aware Membership
The Magnet’s membership service implementation maintains partial views of the node subscriptions, and is based on
the randomized sampling over the interests of the entire node
population in the system. Our currently implemented sampling strategy maintains a separate sample for the interests
of the subscribers of each particular topic t, and propagates
this sample to all the current subscribers of that topic. This
ensures that each subscriber p of t knows of the interest of
at least one other subscriber of t (unless p is the first node to
subscribe to t), as required by the tree join protocol above.
Sampling protocol. For each topic t, the interest sampling protocol is implemented as follows. The subscribers
of t are maintained in a spanning tree built over the edges
of the control DHT, and rooted at home(t). The tree is
maintained dynamically, driven by the arrival of the new t’s
subscribers as well as the departure of the existing ones (due
to either an explicit unsubscribe request, or a failure). The
176
tree maintenance protocol is based on the same techniques
as those of [6], and will not be discussed further here.
The sampling protocol executes in rounds, each of which is
triggered by either a passage of time, or an explicit “round
start” message multicast by home(t). At each round, the
node subscriptions are propagated layer-by-layer in the bottomup fashion starting from the t’s tree leaves, and ending up
at home(t). Upon receiving the subscription sample from
its direct descendants, each inner node q will combine them
with its own interest, and possibly, truncate the sample if it
includes more than a configured number of the node interests
k. The sample truncation is done by choosing k interests uniformly at random, and discarding all the rest. The sampling
round terminates once home(t) is reached, at which point,
home(t) will propagate the resulting view downstream to
the t’s subscribers.
Practical considerations. The scheme above guarantees that at steady state, every subscriber of t will have a
consistent view of the interests of t’s other subscribers. Also,
as we show in Section 4, effective clustering is possible even if
the number of the node interests in the sample is very small.
The sampling collection can therefore be implemented efficiently, even under relatively high churn, provided the average size of the node interest is not too large. One approach to deal with large interests is to replace topic names
in the propagated interests with their hashes. Another is to
use hybrid sampling strategies combining Bloom filters and
gossip-based sampling for large topics and tree-based sampling for small and medium ones. Further comparison of
different approaches for maintaining partial interest views is
the subject of our ongoing study.
4. PERFORMANCE EVALUATION
We implemented Magnet in a simulated setting to evaluate the effect of clustering on the message propagation cost.
Recall that the level of clustering is an artifact of the correlation between user interests in the input. We used several synthetic models of real-world user interest correlation,
including one of our own, as well as a real-world trace to
drive our experiments. We compare the cost of propagating
messages over the Magnet trees against that exhibited by
a strawman implementation, in which the propagation trees
are constructed directly on top of a name-independent DHT.
In effect, our strawman implementation is expected to exhibit the behavior similar that of the Scribe [6] system. We
measure the propagation cost as the number of uninterested
relay nodes on the distribution trees. In this way, we also
indirectly evaluate the reduction of bandwidth consumption
in the system, which is a direct function of the number of
relays. In what follows, we will explain each model and data
set separately and evaluate Magnet and the strawman implementation on each of them.
4.1
Overview of Models
Multi-Modal and Spatial. We first consider synthetic
generative models for user interests inspired by Wong et
al. [34]. They attempt to capture structural dependency
among the subscriber interests as found in applications such
as network games or news dissemination. We considered two
models of this type, Multi-Modal and Spatial, which are
described in details in Sections 4.2 and 4.3 respectively.
Hierarchical-Topics. In addition to the structurally
correlated workloads, we also considered the subscription
Peer degree distribution histogram
2000
1800
1600
number of peers
1400
1200
1000
800
600
400
200
0
15
20
25
30
Peer degree
Figure 2: Multi-Modal model: Distribution of the
number of categories (modes) chosen by peers (the
average is 19).
patterns arising in large-scale collaborative applications, such
as Wikipedia or Yahoo! Groups. Although the statistical
structure of the user preferences in these applications is not
yet well understood, empirical evidence suggests that the
topic popularity distribution in these applications follows
the power-law distribution3 with the α parameter ranging
between 2 and 3 [28, 20]. Unfortunately, the simple technique of populating topic subscriptions by iteratively selecting a random subset of users whose size is drawn from a
power-law distribution fails to capture the more complex dynamics for group overlaps: human users tend to favor topics
popular among other users with similar roles or interests.
In our Hierarchical-Topics model, we instead make
use of the preferential attachment model which is known to
generate a random graph with a power-law degree distribution [20]. We augmented the basic preferential attachment
model by embedding the topic space within a tree structure
which models the hierarchical refinement of the interests.
For example, the topics such as “Hardware Companies” and
“Software and Services” are both refining a broader category,
called “Technology Stock”. The resulting model, which we
call the Hierarchical-Topics model, is described in more
detail in Section 4.4.
Wikipedia: We obtained a trace of a real-world largescale collaborative system, namely a trace of all edits of
Wikipedia articles by registered users over a 6 year period [10].
We ran one experiment in which the Magnet simulation was
fed the subscription patterns extracted from the Wikipedia
trace. In this experiment, we modeled topics as articles and
user subscriptions as the set of the articles edited by that
user. We describe the results of the Wikipedia trace in
Section 4.5.
4.2 Multi-Modal Model
In the Multi-Modal model [34], the topic space is partitioned into a fixed number bn of categories (or modes).
3
In the power-law distribution (also called Pareto or Zipf),
the fraction of topics of popularity x is roughly x1α for a
constant value of α.
177
The peer subscription is generated by first choosing bp categories out of bn uniformly at random, and then selecting a
topic from those categories following a power-law popularity
distribution with parameter α. The subscription generation
proceeds until the average peer has subscribed to a desired
number of topics, which is the parameter of the model. The
Multi-Modal model is a good match for applications such
as news dissemination where user preferences are determined
by their geographical or administrative location or both.
The degree of correlation among the peer interests can be
adjusted by changing the bp and bn parameters while keeping
the bp /bn ratio intact. In other words, the resulting topic frequencies will be the same, although the correlation between
the peer subscriptions will be the highest when bp → 1 and
the lowest (uncorrelated) when bp → bn .
Default values. Unless stated otherwise, the workloads
produced by the Multi-Modal model were used for the
network, consisting of 10, 000 nodes, where each node had on
average 19 links (overlay edges). The resulting peer degree
distribution is shown in Figure 2. Each node subscribes to a
random subset of 50 out of 1000 distinct topics. The powerlaw topic popularity parameter within each category is set
to α = 1 by default. Our algorithms used 10 samples for
every topic, as described in Section 3.4.
Publication rates. We also evaluated the Magnet performance under various publication rates for each topic. In
one experiment every topic was assigned a different publication rate, which was drawn from a power-law distribution with α = 0.75, while in the other the publication rate
was uniform. Figure 3 shows that Magnet outperforms the
strawman implementation under both publication rate scenarios even with low subscription correlations. As expected,
Magnet performed better for workloads with non-uniform
publishing rate, because the rate is always taken into account by Magnet’s peer placement algorithms upon calculating similarity distance among the peers.
Varying correlations. We performed extensive simulations to verify whether Magnet can exploit the subscription
correlation generated by Multi-Modal model. We fixed
the ratio of bp /bn to 0.1 and 0.2, and varied bp and bn from
mostly correlated (every peer chooses one mode out of 10
and 5 modes respectively) to the least correlated (5 modes
out of 50 and 25 respectively). In the reported experiments,
every peer has been assigned 50 unique topics on average.
Figure 5 shows the fraction of messages sent via uninterested peers in Magnet as compared to the strawman for
the most correlated (Figure 5(a), bp = 1, bn = 10) and the
least correlated case (Figure 5(b), bp = 5, bn = 25) with
the power-law topic publication rate (α = 0.75). Publishers
send messages to topics in decreasing order of popularity
with number of messages sent to each in accordance to the
topic publication rate. The plots show the fraction of messages that were sent to uninterested peers for the first x
topics over all messages sent to these topics. The graphs
reveal almost no overhead for the most popular topics (the
first 100 seconds), thus confirming that a good quality of
clustering is achieved under those workloads.
Effects of topic popularity. We have also measured
Magnet’s performance under different subscription sizes at
each peer and the impact of different topic distribution scenarios. We vary the α parameter of the power-law topic
popularity distribution, which directly influences the number of most popular topics in the system.
100
bp/bn = 0.1
bp/bn = 0.2
90
Cost savings using Magnet (%)
Cost savings using Magnet (%)
100
80
70
60
50
40
30
20
10
1
2
3
4
Categories per peer (bp)
bp/bn = 0.1
bp/bn = 0.2
90
80
70
60
50
40
30
20
10
5
1
(a) Uniform topic publication rate.
2
3
4
Categories per peer (bp)
5
(b) Power-law topic publication rate.
Figure 3: Multi-Modal model: Cost is measured by the number of uninterested relays who receive a message.
The remaining model parameters are specified in Section 4.2. The ratio bp /bn for the number of categories
joined by peer is fixed; higher values of bp imply less correlation in the model. Magnet adapts well to
subscription correlations (left) and non-uniform publishing rates (right). In the former case, Magnet benefits
by clustering groups with similar membership; in the latter, it benefits from ensuring that the subscribers of
the relatively few high-rate topics are close to the source.
90
100
Power-law topic popularity within categories
Cost savings using Magnet (%)
Cost savings using Magnet (%)
100
80
70
60
50
40
30
20
10
Power-law topic popularity within categories
Uniform topic popularity within categories
90
80
70
60
50
40
30
20
10
0.5
0.75
1
1.25
1.5
10
Power-law parameter (α)
15
20
25
30
35
40
45
50
Avg. number of topics per peer (out of 1000)
(a) Varying the power-law popularity parameter.
(b) Varying the per-peer subscription size.
Figure 4: Multi-Modal model: System performance while varying topic popularity distributions (left) and
the subscription size (right). The number of peers is 5000 and the average number of subscriptions per peer
is 18. On the right, the power-law topic popularity parameter within each category is set to α = 0.75.
Figure 4(a) shows the performance of Magnet given different α values with the Multi-Modal model (bp = 1 and
bn = 5) and uniform publication rate. We see that Magnet
performs better with the higher α values since the steep
power-law function pushes more peers to subscribe to the
same few popular topics, thus increasing subscription correlation.
Varying subscription sizes. Figure 4(b) shows the performance of Magnet with uniform publication rate as compared to the non-uniform one (power-law with α = 0.75)
while varying the peer subscription sizes. The results show
that Magnet’s algorithms consistently outperform the unclustered DHT-based pub/sub system on all subscription
sizes.
178
4.3
Spatial Model
In the Spatial model [34], the users are distributed uniformly at random on a unit square (1 × 1) and each user is
associated with a single topic which is unique to that user.
The subscriptions are generated so that each user is interested in the topics associated with the users located within
radius r from its own location on the unit square. The spatial model is might predict subscriptions patterns that are
typical in network games where the players would be most
likely interacting with those located close to them on the
virtual game space [30, 18].
Results. In our experiments, we varied the value of r so
that on average every participant is interested in t topics,
and experimented with the values of t being 8, 16 and 32.
Magnet
Strawman (unclustered DHT)
80k
60k
40k
20k
0k
100
101
102
Topics by decreasing popularity
103
Messages received by uninterested relays (%)
Number of messages sent to topic
100k
100
Magnet
Strawman (unclustered DHT)
80
60
40
20
0
0
100 200 300 400 500 600 700 800 900 1000
Topics by decreasing popularity
Number of messages send to topic
120k
Magnet
Strawman (unclustered DHT)
100k
80k
60k
40k
20k
0k
100
101
102
Topics by decreasing popularity
103
Messages received by uninterested relays (%)
(a) High correlation between peers (bp = 1, bn = 10).
100
Magnet
Strawman (unclustered DHT)
80
60
40
20
0
0
100 200 300 400 500 600 700 800 900 1000
Topics by decreasing popularity
(b) Low correlation between peers (bp = 5, bn = 25).
Figure 5: Multi-Modal model: Publishers send messages to every topic in decreasing order of popularity.
The rate of traffic on the topics follows a power-law distribution, as shown on the left. The plots on the right
show the CDF of cost (messages received by uninterested relays) normalized by the total number of messages
sent to the first x topics. Magnet is able to reduce cost by exploiting the correlation between user interests
in the model compared to the strawman.
We measured the performance of Magnet by publishing on
all the topics in the system.
Figure 6 shows the relative decrease in message cost of
Magnet as compared to the strawman implementation with
varying t values for different sizes of the network. We can
observe that Magnet is highly efficient and saves almost
80% of unwanted messages over the strawman implementation for very large networks (10, 000 nodes) with large peer
subscriptions (32 on average). The results are not surprising since the Spatial model produces highly correlated peer
subscription patterns. The correlation increases as the number of topics each peer subscribes to grows, making spatially
driven applications with many users (e.g., online network
games) some of the most favorable environments for Magnet’s deployment.
4.4 Hierarchical Topics Model
The main ingredient in our Hierarchical-Topics model
is to embed the topics as leaves of a hierarchy such that
nodes that are close together in the tree (have short tree dis-
179
tance) are more similar and should thus share more common
users. The technique to populate the hierarchy is similar to
Kleinberg’s tree model for decentralized search [17].
As mentioned earlier, crafting a generative model for group
subscription which displays the power-laws that have been
observed in real-life social data sets is an open problem [28].
We will not attempt to solve the challenge here — instead
we devise a model that leverages one of the most prominent
models to generate a power-law degree distribution and is
used to model the web hyperlink graph: the preferential attachment model [20]. In this model, nodes join the network
one at a time and construct an edge to another node with
probability proportional to that node’s current degree.
Our model works as follows. The idea is to first bootstrap topics to be non-empty and follow a rough power-law
distribution, giving users at least one “home” topic. We let
parameter λ represent homophily, the tendency for peers to
subscribe to topics which are similar to their existing interests. After the initialization, the peers in the popular topics
then join other topics iteratively with a preference for those
103
90
80
Number of topics (CCDF)
Cost savings using Magnet (%)
100
70
60
50
40
30
20
10,000 nodes
2500 nodes
1000 nodes
10
10
15
20
25
Topics per peer (out of 1000)
102
101
100
100
30
Figure 6: Spatial model: As the average number
of subscription per peer increases, Magnet’s performance improves due to correlations stemming from
locality.
101
102
Topic popularity
103
Figure 7: Hierarchical-Topics model: Complementary CDF (CCDF) of topic popularity with
214 nodes, 214 topics, 16 topics/node on average
(initFrac = 10%, α0 = 2, λ = 2). The green fit line is
the CCDF of a power-law distribution with α = 1.68.
close by in the hierarchy according to the λ parameter.
a) We start by populating all topics with random peers
such that the topic popularity follows a power-law distribution with exponent α0 . We add subscriptions iteratively until the target average degree of Z · initFrac
is reached, where Z is the model parameter representing an average number of subscriptions by each peer.
The initFrac parameter effectively characterizes what
fraction of the total number of links should be picked
at random during the initialization.
b) The topics are then organized as the leaves of a binary tree. We then repeat the following steps until the
target average peers subscription size Z is reached.
1) Topic t is picked with probability proportional
to t’s popularity. This step is a variation of the
preferential-attachment model [20].
2) Next, peer p is picked uniformly at random from
the list of all the t’s subscribers.
3) Let ℓ be a random variable representing tree distance, such that Pr[ℓ = x] = Ce−λx where x can
be at the most the height of the hierarchy and C
is a normalizing constant. Peer p now subscribes
to topic t′ , which is picked uniformly at random
among topics at distance ℓ from t.
Figure 7 shows a workload produced by running the model
with 214 nodes, 214 topics, Z = 16 topics per node on average, initFrac = 10%, α0 = 2 and λ = 2.
Results. Like in our previous experiments, we studied
how well Magnet can exploit the correlation among the
peer subscriptions generated by the Hierarchical-Topics
model. We fixed the number of both topics and peers to
214 and the exponent of the initial power-law distribution to
α0 = 2. Since the main parameters affecting the correlation
rate among the peer subscriptions are λ and initFrac, for
the first set of experiments we have investigated Magnet’s
behavior as we vary those parameters.
Figure 8(a) shows Magnet’s performance with the average number of topics per peer set to Z = 16. It is evident
that Magnet performs better than the strawman imple-
180
mentation with high values of λ and low values of initF rac
since these produce the most correlated subscription patterns. In the second set of experiments we have fixed the
value of initFrac to 10% and studied the Magnet’s performance under different values of the average subscription
size Z. We also see that since the smaller values of Z imply
higher correlation among the peer subscriptions in model,
Magnet performed best with Z = 8 (see Figure 8(b)).
4.5
Wikipedia Subscription Patterns
We have also analyzed the performance of Magnet using
the subscription workload extracted from the trace of all
edits of Wikipedia entries by registered users over a 6 year
period. In this experiment, each entry of the encyclopedia
was treated as a topic and each unique editor as a Magnet
peer. The entries edited by a specific users were interpreted
as the the interest of the corresponding peer.
For our experiments we have selected 3000 random topics from the entire entry set, which were edited by nearly
10, 000 unique users. The topic popularity varies from 1 to
348 subscribers per topic, and on average every topic has 5.4
subscribers. These subscription data were fed to Magnet
and to the strawman implementation. Average node degree
for both P2P networks was set to 12. We measured the
cost of publishing messages for each of the topics in the network. The experiments showed that the distribution trees
constructed by the strawman implementation included on
average 77% more uninterested peers than those constructed
by Magnet.
5. RELATED WORK
Several publish/subscribe systems based on structured overlays have been proposed in the past, notably Scribe [6] and
Bayeux [36]. Generally speaking, none of these systems attempt to cluster peers based on their subscription similarity.
An exception is TERA [4], which creates an overlay for each
topic to accelerate dissemination. The scalability of TERA,
however, is limited when the number of topics subscribed to
by nodes is large.
100
90
90
Cost savings using Magnet (%)
Cost savings using Magnet (%)
100
80
70
60
50
40
30
20
initFrac = 10%
initFrac = 20%
initFrac = 30%
10
1
1.5
2
λ parameter
2.5
80
70
60
50
40
30
20
8 subscriptions per peer on average (Z)
16 subscriptions per peer on average (Z)
32 subscriptions per peer on average (Z)
10
3
(a) Varying the initFrac parameter.
1
1.5
2
λ parameter
2.5
3
(b) Varying the avg. subscriptions size Z per peer.
Figure 8: Hierarchical-Topics model: Magnet’s performance improves with increased subscription correlation
(higher value of λ). Higher values of initFrac imply fewer exploitable correlations, supporting the trend seen
on the curve on the left. Increasing the average degree Z while keeping initFrac constant incorporates
randomness from the initialization stage as suggested by the decline on the right. We see that Magnet brings
significant cost savings over the strawman approach on the Hierarchical-Topics model.
The benefits of clustering peer subscriptions have been
investigated in the context of unstructured overlays [31, 22,
9, 3]. In particular, Sub-2-Sub [31] achieved clustering by
organizing the subscribers to each topic into a separate ring
structure. This approach, however, results in overlays whose
average degree grows linearly with the average subscription
size which limits scalability. Rappel [22] provides a feedbased pub/sub service using gossip-like mechanisms and exploits interest similarity to avoid messages being received
by uninterested nodes. Due to the assumption that each
topic has only a single publisher, as is the case in feed-based
systems, Rappel differs fundamentally from our work.
The idea of exploiting subscription similarity to reduce the
space per node requirements of the clustering was explored
in Spidercast [9] and Data-aware multicast [3]. The trade-off
between the node degree and the clustering quality has been
addressed in a theoretical study [8].
Topic clustering [21, 27, 2, 29] looks into amortizing overheads associated with message dissemination in large pub/sub
systems by aggregating multiple topics into larger groups (or
channels). It was first introduced in [2] in the context of optimal assignment of multicast groups to multicast addresses,
and subsequently extended to general purpose pub/sub systems in [27, 29]. The existing solutions to topic clustering rely on approximation techniques (such as k-means [27])
whose convergence depends on the accurate common knowledge of the current assignment of topics to channels. They
are not easy to implement in a decentralized fashion [25].
6. CONCLUSION
In this paper, we have presented Magnet, an overlaybased infrastructure for scalable topic-based pub/sub which
takes advantage of the existing subscription correlation patterns among the subscribers. The technique we proposed
allows the formation of clusters of peers with similar interests in the underlying topology, which enables the construction of efficient dissemination structures (specifically
181
spanning trees) that are known to be robust (e.g., Scribe,
Bayeux). However, the clustering process leads to the nonuniform peer identifier distributions which renders all nameindependent DHT solutions (e.g., Chord, Pastry) unusable.
Therefore, Magnet employs the Oscar overlay as the underlying topology which is provably small-world and can efficiently operate with arbitrary distribution scenarios. Because of its inherent small-world design, Magnet scales well
with the number of nodes, and ensures fixed network degree
regardless the number of topics or the size of subscriptions.
We simulate Magnet on a variety of subscription models,
including a novel one, as well as on real-life subscription patterns from Wikipedia. Our experiments show that Magnet
is able to achieve significant savings — sometimes up to 80%
— of the message dissemination costs over a strawman running a typical peer-to-peer publish/subscribe system based
on name-independent DHTs. Furthermore, we demonstrate
that this cost reduction is adaptive to both the extent to
which the individual node subscriptions correlate, and the
amount of information about the other node subscriptions
available to each node. In particular, in the worst case scenario when subscriptions are completely uncorrelated or unknown or both, the message dissemination costs are no worse
than those of name-independent DHT systems.
Our findings suggest that subscription clustering techniques can detect and exploit correlation at low cost, and
may improve the performance of large-scale publish/subscribe
systems for a variety of settings.
7. ACKNOWLEDGMENTS
We thank our shepherd and the anonymous reviewers for
helpful feedback. This work is partially supported by EU
IST Project CoMiFin FP7-ICT-225407/2008 and partially
carried out within the SICS Center for Networked Systems
funded by VINNOVA, SSF, KKS, ABB, Ericsson, Saab Systems, TeliaSonera, T2Data, Vendolocus and Peerialism.
8. REFERENCES
[1] I. Abraham, D. Malkhi, and O. Dobzinski. D.: LAND:
Stretch (1 + ε) locality-aware networks for DHTs. In
Proc. 15th Ann. ACM-SIAM Symp. on Discrete
Algorithms (SODA), pages 550–559, 2004.
[2] M. Adler, Z. Ge, J. F. Kurose, D. F. Towsley, and
S. Zabele. Channelization problem in large scale data
dissemination. In ICNP 2001: Proceedings of the
Ninth International Conference on Network Protocols,
page 100, Washington, DC, USA, 2001. IEEE
Computer Society.
[3] S. Baehni, P. T. Eugster, and R. Guerraoui.
Data-aware multicast. In DSN ’04: Proceedings of the
2004 International Conference on Dependable Systems
and Networks, page 233, Washington, DC, USA, 2004.
IEEE Computer Society.
[4] R. Baldoni, R. Beraldi, V. Quema, L. Querzoni, and
S. Tucci-Piergiovanni. Tera: topic-based event routing
for peer-to-peer architectures. In DEBS ’07:
Proceedings of the 2007 inaugural international
conference on Distributed event-based systems, pages
2–13, New York, NY, USA, 2007. ACM.
[5] A. Bharambe, M. Agrawal, and S. Seshan. Mercury:
Supporting scalable multi-attribute range queries. In
ACM SIGCOMM, Portland, USA, 2004.
[6] M. Castro, P. Druschel, A.-M. Kermarrec, and
A. Rowstron. SCRIBE: a large-scale and decentralized
application-level multicast infrastructure. IEEE J.
Selected Areas in Comm. (JSAC), 20(8):1489–1499,
2002.
[7] M. Cha, P. Rodriguez, J. Crowcroft, S. Moon, and
X. Amatriain. Watching television over an IP network.
In IMC ’08: Proceedings of the 8th ACM SIGCOMM
conference on Internet measurement, pages 71–84,
New York, NY, USA, 2008. ACM.
[8] G. Chockler, R. Melamed, Y. Tock, and R. Vitenberg.
Constructing scalable overlays for pub-sub with many
topics. In PODC ’07: Proceedings of the twenty-sixth
annual ACM symposium on Principles of distributed
computing, pages 109–118, New York, NY, USA, 2007.
ACM.
[9] G. Chockler, R. Melamed, Y. Tock, and R. Vitenberg.
SpiderCast: A Scalable Interest-Aware Overlay for
Topic-Based Pub/Sub Communication. In 1th
International Conference on Distributed Event-Based
Systems (DEBS). ACM, 6 2007.
[10] D. J. Crandall, D. Cosley, D. P. Huttenlocher, J. M.
Kleinberg, and S. Suri. Feedback effects between
similarity and social influence in online communities.
In KDD, pages 160–168. ACM, 2008.
[11] S. Girdzijauskas, A. Datta, and K. Aberer. On small
world graphs in non-uniformly distributed key spaces.
In NetDB2005, Tokyo, Japan, 2005.
[12] S. Girdzijauskas, A. Datta, and K. Aberer. Oscar:
Small-world overlay for realistic key distributions. In
DBISP2P 2006, Seoul, Korea, 2006.
[13] S. Girdzijauskas, A. Datta, and K. Aberer. Oscar: A
Data-Oriented Overlay For Heterogeneous
Environments. In ICDE 2007, Istanbul, Turkey, 2007.
[14] S. Girdzijauskas, A. Datta, and K. Aberer. Structured
overlay for heterogeneous environments: Design and
evaluation of oscar. ACM Transactions on
182
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
Autonomous and Adaptive Systems (TAAS), Volume
5, February 2010.
R. Guerraoui, S. B. Handurukande, K. Huguenin,
A.-M. Kermarrec, F. L. Fessant, and E. Riviere.
GosSkip, an efficient, fault-tolerant and self organizing
overlay using gossip-based construction and skip-lists
principles. IEEE International Conference on
Peer-to-Peer Computing, 0:12–22, 2006.
J. Kleinberg. The Small-World Phenomenon: An
Algorithmic Perspective. In Proceedings of the 32nd
ACM Symposium on Theory of Computing, 2000.
J. Kleinberg. Complex networks and decentralized
search algorithms. Proceedings of the International
Congress of Mathematicians (ICM), 2006.
B. Knutsson, H. Lu, W. Xu, and B. Hopkins.
Peer-to-peer support for massively multiplayer games.
INFOCOM 2004. Twenty-third Annual Joint
Conference of the IEEE Computer and
Communications Societies, 1, 2004.
M. Mitzenmacher. The power of two choices in
randomized load balancing. IEEE Transactions on
Parallel and Distributed Systems, 12(10):1094–1104,
2001.
M. E. J. Newman. The structure and function of
complex networks. SIAM Review, 45:167–256, 2003.
K. Ostrowski, K. Birman, and D. Dolev. Live
distributed objects: Enabling the active web. In IEEE
Internet Computing, 11(6), p. 72, 2007.
J. A. Patel, E. Riviere, I. Gupta, and A.-M.
Kermarrec. Rappel: Exploiting interest and network
locality to improve fairness in publish-subscribe
systems. Computer Networks, 53(13):2304 – 2320,
2009. Gossiping in Distributed Systems.
T. Qiu, Z. Ge, S. Lee, J. Wang, Q. Zhao, and J. Xu.
Modeling channel popularity dynamics in a large
IPTV system. In SIGMETRICS ’09: Proceedings of
the eleventh international joint conference on
Measurement and modeling of computer systems,
pages 275–286, New York, NY, USA, 2009. ACM.
A. Rowstron and P. Druschel. Pastry: Scalable,
distributed object location and routing for large-scale
peer-to-peer systems. In IFIP/ACM International
Conference on Distributed Systems Platforms
(Middleware), Heidelberg, Germany, 2001.
A. Shraer, G. Chockler, I. Keidar, R. Melamed,
Y. Tock, and R. Vitenberg. Local on-line maintenance
of scalable pub/sub infrastructure. In DSN 2007: in
the 37th Annual IEEE/IFIP International Conference
on Dependable Systems and Networks, page 100,
Edinburgh, UK, 2007.
I. Stoica, R. Morris, D. R. Karger, M. F. Kaashoek,
and H. Balakrishnan. Chord: A scalable peer-to-peer
lookup service for internet applications. In
SIGCOMM, pages 149–160, 2001.
Y. Tock, N. Naaman, A. Harpaz, and G. Gershinsky.
Hierarchical clustering of message flows in a multicast
data dissemination system. In 17th IASTED
International Conference Parallel and Distributed
Computing and Systems, pages 320–327, 2005.
Y. Vigfusson. Affinity in Distributed Systems. PhD
thesis, Cornell University, 2009.
Y. Vigfusson, H. Abu-Libdeh, M. Balakrishnan,
K. Birman, R. Burgess, G. Chockler, H. Li, and
Y. Tock. Dr. Multicast: Rx for Data Center
Communication Scalability. In EuroSys ’10:
Proceedings of the ACM SIGOPS/EuroSys European
Conference on Computer Systems, April 2010.
[30] K.-H. Vik, C. Griwodz, and P. Halvorsen.
Applicability of group communication for increased
scalability in MMOGs. In NetGames ’06: Proceedings
of 5th ACM SIGCOMM workshop on Network and
system support for games, page 2, New York, NY,
USA, 2006. ACM.
[31] S. Voulgaris, E. Riviere, A.-M. Kermarrec, and M. van
Steen. Sub-2-sub: Self-organizing content-based
publish subscribe for dynamic large scale collaborative
networks. In IPTPS, 2006.
[32] S. Weiss, P. Urso, and P. Molli. Wooki: A P2P
wiki-based collaborative writing tool. In WISE,
volume 4831 of Lecture Notes in Computer Science,
pages 503–512. Springer, 2007.
183
[33] B. Wong, Y. Vigfússon, and E. G. Sirer. Hyperspaces
for object clustering and approximate matching in
peer-to-peer overlays. In HotOS’07: Proceedings of the
11th USENIX Workshop on Hot Topics in Operating
Systems, pages 1–6, Berkeley, CA, USA, 2007.
USENIX Association.
[34] T. Wong, R. Katz, and S. Mccanne. An evaluation of
preference clustering in large-scale multicast
applications. In Proceedings of IEEE INFOCOM,
pages 451–460, 2000.
[35] B. Y. Zhao, J. D. Kubiatowicz, and A. D. Joseph.
Tapestry: An infrastructure for fault-tolerant wide-are
location and routing. Technical Report
UCB/CSD-01-1141, UC Berkeley, 2001.
[36] S. Q. Zhuang, B. Y. Zhao, A. D. Joseph, Y. H. Katz,
and J. D. Kubiatowicz. Bayeux: An architecture for
scalable and fault-tolerant wide-area data
dissemination. pages 11–20, 2001.