Peer-to-Peer Networks: Jeff Pang
Peer-to-Peer Networks: Jeff Pang
Peer-to-Peer Networks: Jeff Pang
Jeff Pang
Intro
Quickly grown in popularity
Centralized Database
Napster
Query Flooding
Gnutella
Intelligent Query Flooding
KaZaA
Swarming
BitTorrent
Unstructured Overlay Routing
Freenet
Structured Overlay Routing
Distributed Hash Tables (Chord)
The Lookup Problem
N2
N1 N3
Key=“title” Internet
Value=MP3 data… ? Client
Publisher
Lookup(“title”)
N4 N6
N5
The Lookup Problem (2)
Common Primitives:
Join: how to I begin participating?
Publish: how do I advertise my file?
Search: how to I find a file?
Fetch: how to I retrieve a file?
Next Topic...
Centralized Database
Napster
Query Flooding
Gnutella
Intelligent Query Flooding
KaZaA
Swarming
BitTorrent
Unstructured Overlay Routing
Freenet
Structured Overlay Routing
Distributed Hash Tables
Napster: History
Centralized Database:
Join: on startup, client contacts central server
Publish: reports list of files to central server
Search: query the server => return someone that
stores the requested file
Fetch: get the file directly from peer
Napster: Publish
insert(X,
123.2.21.23)
...
Publish
I have X, Y, and Z!
123.2.21.23
Napster: Search
123.2.0.18
search(A)
-->
Fetch 123.2.0.18
Query Reply
Where is file A?
Napster: Discussion
Pros:
Simple
Cons:
Server maintains O(N) State
Centralized Database
Napster
Query Flooding
Gnutella
Intelligent Query Flooding
KaZaA
Swarming
BitTorrent
Unstructured Overlay Routing
Freenet
Structured Overlay Routing
Distributed Hash Tables
Gnutella: History
Query Flooding:
Join: on startup, client contacts a few other
nodes; these become its “neighbors”
Publish: no need
Search: ask neighbors, who as their
neighbors, and so on... when/if found, reply
to sender.
Fetch: get the file directly from peer
Gnutella: Search
I have file A.
I have file A.
Reply
Query
Where is file A?
Gnutella: Discussion
Pros:
Fully de-centralized
Search cost distributed
Cons:
Search scope is O(N)
Search time is O(Logb(N))
b: the average outdegree
Nodes leave often, network unstable
Aside: Search Time?
Aside: All Peers Equal?
QuickTime™ and a
TIFF (Uncompressed) decompressor
56kbps Modem
are needed to see this picture.
1.5Mbps DSL
10Mbps LAN
1.5Mbps DSL
56kbps Modem
56kbps Modem
Aside: Network Resilience
Centralized Database
Napster
Query Flooding
Gnutella
Intelligent Query Flooding
KaZaA
Swarming
BitTorrent
Unstructured Overlay Routing
Freenet
Structured Overlay Routing
Distributed Hash Tables
KaZaA: History
In 2001, KaZaA created by Dutch company Kazaa BV.
Founders: Niklas Zennstrom and Janus Friis. (heard of
skype? They wrote it also.)
Single network called FastTrack used by other clients as well:
Morpheus, giFT, etc.
Eventually protocol changed so other clients could no longer
talk to it.
Most popular file sharing network today with >10 million users
(number varies)
KaZaA: Overview
insert(X,
123.2.21.23)
...
Publish
I have X!
123.2.21.23
KaZaA: File Search
search(A)
-->
123.2.22.50
search(A)
123.2.22.50 -->
Query Replies 123.2.0.18
Where is file A?
123.2.0.18
KaZaA: Fetching
More than one node may have requested file...
How to tell?
Must be able to distinguish identical files
How to fetch?
Get bytes [0..1000] from A, [1001...2000] from B
Pros:
Tries to take into account node heterogeneity:
Bandwidth
Host Computational Resources
Host Availability (?)
Rumored to take into account network locality
Cons:
Mechanisms easy to circumvent
Centralized Database
Napster
Query Flooding
Gnutella
Intelligent Query Flooding
KaZaA
Swarming
BitTorrent
Unstructured Overlay Routing
Freenet
Structured Overlay Routing
Distributed Hash Tables
BitTorrent: History
Swarming:
Join: contact centralized “tracker” server, get a list
of peers.
Publish: Run a tracker server.
Search: Out-of-band. E.g., use Google to find a
tracker for the file you want.
Fetch: Download chunks of the file from your
peers. Upload chunks you have to them.
BitTorrent: Fetch
BitTorrent: Sharing Strategy
Pros:
Works reasonably well in practice
Gives peers incentive to share resources; avoids
freeloaders
Cons:
Pareto Efficiency relative weak condition
Aside: Static BitTorrent-Like Networks
Source
a a,b,c a,b,c a,b,c
b
c
Is this scheme optimal?
a b
Scenario 1: All nodes with same capacities.
Convince yourself that partioning packets
equally among nodes as above is optimal, i.e.
minimize the time to download the file for all the
nodes.
Centralized Database
Napster
Query Flooding
Gnutella
Intelligent Query Flooding
KaZaA
Swarming
BitTorrent
Unstructured Overlay Routing
Freenet
Structured Overlay Routing
Distributed Hash Tables
Freenet: History
…
If file id stored locally, then stop
Forward data back to upstream requestor
If not, search for the “closest” id in the table, and
forward the message to the corresponding
…
next_hop
If data is not found, failure is reported back
Requestor then tries next closest match in routing
table
Freenet: Routing
query(10)
n1 n2
4 n1 f4 1 9 n3 f9
12 n2 f12 4’
5 n3 4 n4 n5
14 n5 f14 5 4 n1 f4
2
13 n2 f13 10 n5 f10
n3 3 3 n6 8 n6
3 n1 f3
14 n4 f14
5 n3
Freenet: Routing Properties
Pros:
Intelligent routing makes queries relatively short
Search scope small (only nodes along search path
involved); no flooding
Anonymity properties may give you “plausible deniability”
Cons:
Still no provable guarantees!
Anonymity features make it hard to measure, debug
Next Topic...
Centralized Database
Napster
Query Flooding
Gnutella
Intelligent Query Flooding
KaZaA
Swarming
BitTorrent
Unstructured Overlay Routing
Freenet
Structured Overlay Routing
Distributed Hash Tables (DHT)
DHT: History
Publication contains actual file => fetch from where query stops
Publication says “I have file X” => query tells you 128.2.1.3 has X,
use IP routing to get X from 128.2.1.3
DHT: Example - Chord
Properties:
Routing table size is O(log N) , where N is the total number
of nodes
Guarantees that a file is found in O(log N) hops
Key 5 K5
Node 105
N105 K20
N90
K80
A key is stored at its successor: node with next higher ID
DHT: Chord Basic Lookup
N120
N10
“Where is key 80?”
N105
K80 N90
N60
DHT: Chord “Finger Table”
1/4 1/2
1/8
1/16
1/32
1/64
1/128
N80
Entry i in the finger table of node n is the first node that succeeds or
equals n + 2i
In other words, the ith finger points 1/2n-i way around the ring
DHT: Chord Join
6 2
5 3
4
DHT: Chord Join
6 2
Succ. Table
i
i id+2 succ
5 3 0 3 1
4
1 4 1
2 6 1
DHT: Chord Join
Succ. Table
i
i id+2 succ
0 1 1
1 2 2
2 4 0
Nodes n0 joins Succ. Table
i
0 i id+2 succ
1 0 2 2
7 1 3 0
2 5 0
6 2
Succ. Table
i
i id+2 succ
5 3 0 3 0
4
1 4 0
2 6 0
DHT: Chord Join
Succ. Table
i
i id+2 succ
0 1 1
1 2 2
2 4 6
Nodes n6 joins Succ. Table
i
0 i id+2 succ
1 0 2 2
7 1 3 6
2 5 6
Succ. Table
i
i id+2 succ
0 7 0 6 2
1 0 0
2 2 2
Succ. Table
i
i id+2 succ
5 3 0 3 6
4
1 4 6
2 6 6
DHT: Chord Join
Succ. Table Items
i
i id+2 succ 7
Nodes: 0 1 1
1 2 2
n1, n2, n0, n6 2 4 6
f7, f1 7 1 i
i id+2 succ 1
0 2 2
1 3 6
2 5 6
Succ. Table 6 2
i
i id+2 succ
0 7 0
1 0 0 Succ. Table
2 2 2 i
i id+2 succ
5 3 0 3 6
4
1 4 6
2 6 6
DHT: Chord Routing
Succ. Table Items
i
i id+2 succ 7
Upon receiving a query for 0 1 1
item id, a node: 1 2 2
Checks whether stores the 2 4 6
item locally
If not, forwards the query to 0 Succ. Table Items
the largest node in its 1 i
i id+2 succ 1
7
successor table that does 0 2 2
not exceed id query(7)
1 3 6
2 5 6
Succ. Table 6 2
i
i id+2 succ
0 7 0
1 0 0 Succ. Table
2 2 2 i
i id+2 succ
5 3 0 3 6
4
1 4 6
2 6 6
DHT: Chord Summary
Pros:
Guaranteed Lookup
O(log N) per node state and search scope
Cons:
No one uses them? (only one file sharing app)
Supporting non-exact match search is hard
P2P: Summary
Many different styles; remember pros and cons of each
centralized, flooding, swarming, unstructured and structured
routing
Lessons learned:
Single points of failure are very bad