Nets 212: Scalable and Cloud Computing: Graph Algorithms in Mapreduce October 15, 2013
Nets 212: Scalable and Cloud Computing: Graph Algorithms in Mapreduce October 15, 2013
Computing
2013 A. Haeberlen, Z.
Ives
University of Pennsylvania
Announcements
2013 A. Haeberlen, Z.
Ives
2013 A. Haeberlen, Z.
Ives
Beyond average/sum/count
2013 A. Haeberlen, Z.
Ives
2013 A. Haeberlen, Z.
Ives
CIS 320 (algorithms), CIS 391/520 (AI), CIS 455 (Web Systems)
Computation model
Iterative MapReduce
A toolbox of algorithms
2013 A. Haeberlen, Z.
Ives
University of Pennsylvania
fan-of
fan-of
friend-of
Alice
fan-of
friend-of
Sunita
fan-of
Mikhail
fan-of
Magna Carta
Jose
2013 A. Haeberlen, Z.
Ives
Jose
Sunita
2013 A. Haeberlen, Z.
Ives
2013 A. Haeberlen, Z.
Ives
Sunita
Jose
(Alice, Facebook)
(Alice, Sunita)
(Jose, Magna Carta)
(Jose, Sunita)
(Mikhail, Facebook)
(Mikhail, Magna
Carta)
(Sunita, Facebook)
(Sunita, Alice)
(Sunita, Jose)
fan-of
fan-of
friend-of
Alice
2013 A. Haeberlen, Z.
Ives
fan-of
friend-of
Sunita
fan-of
Mikhail
fan-of
Magna Carta
Jose
10
fan-of
0.8
fan-of 0.5
0.7 fan-of
friend-of
friend-of
Alice
2013 A. Haeberlen, Z.
Ives
0.9
Sunita
0.3
fan-of
Mikhail 0.7
fan-of
Magna Carta
0.5
Jose
11
12
We can encode the graph in various
2013 A. Haeberlen, Z.
Ives
NEXT
Computation model
Iterative MapReduce
A toolbox of algorithms
2013 A. Haeberlen, Z.
Ives
University of Pennsylvania
13
fan-of
0.8
fan-of 0.5
0.7 fan-of
friend-of
friend-of
Alice
Sunita
0.3
Mikhail 0.7
fan-of
Magna Carta
0.5
Jose
0.9
fan-of
2013 A. Haeberlen, Z.
Ives
fan-of
0.8
fan-of 0.5
0.7 fan-of
friend-of
friend-of
Alice
0.9
Sunita
0.3
fan-of
Mikhail 0.7
fan-of
Jose
0.5
Magna Carta
Slightly more
technical: How many
of my friends have
me as their
best friend?
2013 A. Haeberlen, Z.
Ives
University of Pennsylvania
15
friend-of
0.9
0.3
Alice
Sunita
Jose
2013 A. Haeberlen, Z.
Ives
University of Pennsylvania
16
friend-of
0.9
0.3
Alice
Sunita
Jose
2013 A. Haeberlen, Z.
Ives
University of Pennsylvania
17
friend-of
0.9
0.3
Alice
Sunita
Jose
2013 A. Haeberlen, Z.
Ives
Step
Step
Step
Step
#1:
#2:
#3:
#4:
18
}
reduce(key: ________, values: list of _________)
{
2013 A. Haeberlen, Z.
Ives
19
}
reduce(key: ________, values: list of _________)
{
2013 A. Haeberlen, Z.
Ives
20
Friend recommendation!
2013 A. Haeberlen, Z.
Ives
21
Generalizing
Example: How many of my friends' friends (distance2 neighbors) have me as their best friend's best
friend?
What do we need to do?
2013 A. Haeberlen, Z.
Ives
22
Iterative MapReduce
2013 A. Haeberlen, Z.
Ives
2013 A. Haeberlen, Z.
Ives
2013 A. Haeberlen, Z.
Ives
Computation model
Iterative MapReduce
A toolbox of algorithms
NEXT
2013 A. Haeberlen, Z.
Ives
University of Pennsylvania
26
Path-based algorithms
2013 A. Haeberlen, Z.
Ives
?
c
2013 A. Haeberlen, Z.
Ives
?
d
28
SSSP: Intuition
bestDistanceAndPath(v) {
if (v == source) then {
return <distance 0, path [v]>
} else {
find argmin_u (bestDistanceAndPath[u] + dist[u,v])
return <bestDistanceAndPath[u] + dist[u,v], path[u] + v>
}
}
2013 A. Haeberlen, Z.
Ives
29
2013 A. Haeberlen, Z.
Ives
30
10
2
s
4
7
Q = {s,a,b,c,d}
spSet = {}
dist_S_To: {(a,), (b,), (c,), (d,)}
predecessor: {(a,nil), (b,nil), (c,nil), (d,nil)}
2013 A. Haeberlen, Z.
Ives
31
a 1
0
10
2
s
4
7
Q = {a,b,c,d}
spSet = {s}
dist_S_To: {(a,10), (b,), (c,5), (d,)}
predecessor: {(a,s), (b,nil), (c,s), (d,nil)}
2013 A. Haeberlen, Z.
Ives
32
1 b
4
10
2
s
7
c
Q = {a,b,d}
spSet = {c,s}
dist_S_To: {(a,8), (b,14), (c,5), (d,7)}
predecessor: {(a,c), (b,c), (c,s), (d,c)}
2013 A. Haeberlen, Z.
Ives
33
1 b
3
10
2
s
4
7
Q = {a,b}
spSet = {c,d,s}
dist_S_To: {(a,8), (b,13), (c,5), (d,7)}
predecessor: {(a,c), (b,d), (c,s), (d,c)}
2013 A. Haeberlen, Z.
Ives
34
10
2
s
7
c
Q = {b}
spSet = {a,c,d,s}
dist_S_To: {(a,8), (b,9), (c,5), (d,7)}
predecessor: {(a,c), (b,a), (c,s), (d,c)}
2013 A. Haeberlen, Z.
Ives
35
10
2
s
7
c
Q = {}
spSet = {a,b,c,d,s}
dist_S_To: {(a,8), (b,9), (c,5), (d,7)}
predecessor: {(a,c), (b,a), (c,s), (d,c)}
2013 A. Haeberlen, Z.
Ives
36
2013 A. Haeberlen, Z.
Ives
2013 A. Haeberlen, Z.
Ives
init:
map:
The shortest path we have found so far ... this is the next... and here is the adjacency
list for nodeID
from the source to nodeID has length hop on that path...
...
reduce:
2013 A. Haeberlen, Z.
Ives
"Wave"
10
2
s 0
4
7
c
2013 A. Haeberlen, Z.
Ives
40
Iteration 1
mapper: (a,<s,10>) (c,<s,5>) (a,<c,8>) (c,<a,9>)
(b,<a,11>)
(b,<c,14>) (d,<c,7>) edges
reducer: (a,<8, ...>) (c,<5, ...>) (b,<11, ...>) (d,<7, ...>)
"Wave
b
1
a 1
"
0
10
2
s 0
7
c
2013 A. Haeberlen, Z.
Ives
41
Iteration 2
mapper: (a,<s,10>) (c,<s,5>) (a,<c,8>) (c,<a,9>)
(b,<a,11>)
(b,<c,14>) (d,<c,7>) (b,<d,13>) (d,<b,15>)
edges
reducer: (a,<8>) (c,<5>) (b,<11>) (d,<7>)
"Wave
1
a
1 b
"
8
1
10
2
s 0
7
c
2013 A. Haeberlen, Z.
Ives
7
42
No change!
change!
No
Convergence!
Convergence!
Iteration 3
s 0
5
Question: If a vertex's path cost
is the same in two consecutive
rounds, can we be sure that
this vertex has converged?
2013 A. Haeberlen, Z.
Ives
4
7
7
43
Summary: SSSP
2013 A. Haeberlen, Z.
Ives
Computation model
Iterative MapReduce
A toolbox of algorithms
2013 A. Haeberlen, Z.
Ives
University of Pennsylvania
45
Learning (clustering /
classification)
2013 A. Haeberlen, Z.
Ives
46
Clusters
Items
Expenses
2013 A. Haeberlen, Z.
Ives
Approach: k-Means
mi(t 1)
2013 A. Haeberlen, Z.
Ives
1
Si( t )
x j S i( t )
48
(20,21)
Age
(18,20)
(30,21)
(11,16)
(10,10)
(15,12)
Expenses
2013 A. Haeberlen, Z.
Ives
49
(20,21)
Age
(18,20)
(30,21)
(11,16)
Randomly chosen
initial centers
(10,10)
(15,12)
Expenses
2013 A. Haeberlen, Z.
Ives
50
(20,21)
(18,20)
(30,21)
Age
(19.75,19.5)
(11,16)
(12.5,11)
(10,10)
(15,12)
Expenses
2013 A. Haeberlen, Z.
Ives
51
(20,21)
(30,21)
Age
(18,20)
(22.67,20.67)
(11,16)
(12,12.67)
(10,10)
(15,12)
Expenses
2013 A. Haeberlen, Z.
Ives
Stable!
52
k-Means in MapReduce
Map #1:
Reduce #1:
Map #2:
Reduce #2:
Each centroid will need to know where all the other centroids are
2013 A. Haeberlen, Z.
Ives
53
Computation model
Iterative MapReduce
A toolbox of algorithms
2013 A. Haeberlen, Z.
Ives
University of Pennsylvania
54
Classification
2013 A. Haeberlen, Z.
Ives
A simple example
Won contract)
Won award)
"Won the lottery")
Unsubscribe)
"Millions of customers")
"Millions of dollars")
2013 A. Haeberlen, Z.
Ives
56
2013 A. Haeberlen, Z.
Ives
57
p(spam)
Easy
Easy
p(containsXYZ | spam)
p(containsXYZ)
2013 A. Haeberlen, Z.
Ives
58
map 1:
reduce 1:
reduce 2:
2013 A. Haeberlen, Z.
Ives
map 2:
2013 A. Haeberlen, Z.
Ives
Stay tuned
University of Pennsylvania
61