04 Pagerank
04 Pagerank
04 Pagerank
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 5
I teach a
class on
Networks. CS224W:
Classes are
in the
Gates
building Computer
Science
Department
at Stanford
Stanford
University
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 6
I teach a
class on
Networks. CS224W:
Classes are
in the
Gates
building Computer
Science
Department
at Stanford
Stanford
University
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 9
How is the Web linked?
What is the “map” of the Web?
Web as a directed graph [Broder et al. 2000]:
▪ Given node v, what nodes can v reach?
▪ What other nodes can reach v?
E
F
B
D C G
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 10
All web pages are not equally “important”
thispersondoesnotexist.com vs. www.stanford.edu
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 11
We will cover the following Link Analysis
approaches to compute the importance of
nodes in a graph:
▪ PageRank
▪ Personalized PageRank (PPR)
▪ Random Walk with Restarts
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 12
Idea: Links as votes
▪ Page is more important if it has more links
▪ In-coming links? Out-going links?
Think of in-links as votes:
▪ www.stanford.edu has 23,400 in-links
▪ thispersondoesnotexist.com has 1 in-link
Are all in-links equal?
▪ Links from important pages count more
▪ Recursive question!
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 13
A “vote” from an important
page is worth more:
i k
▪ Each link’s vote is proportional ri / 3
rk / 4
to the importance of its source j rj / 3
page rj / 3 rj / 3
▪ If page i with importance ri has
di out-links, each link gets ri / di
votes rj = ri/3 + rk/4
▪ Page j’s own importance rj is
the sum of the votes on its in-
links
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 14
A page is important if it is The web in 1839
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 16
ry ra rm
y ry ½ ½ 0
ra ½ 0 1
a m rm 0 ½ 0
ry = ry /2 + ra /2 ry ½ ½ 0 ry
ra = ry /2 + rm ra = ½ 0 1 ra
rm = ra /2 rm 0 ½ 0 rm
𝒓 𝑴 𝒓
10/6/2021 Jure Les kovec, Stanford C246: Mi ning Massive Datasets 17
i1 i2 i3
Imagine a random web surfer:
▪ At any time 𝒕, surfer is on some page 𝑖
▪ At time 𝒕 + 𝟏, the surfer follows an j
out-link from 𝒊 uniformly at random
▪ Ends up on some page 𝒋 linked from 𝒊
▪ Process repeats indefinitely
Let:
𝒑(𝒕) … vector whose 𝑖 th coordinate is the
prob. that the surfer is at page 𝑖 at time 𝑡
▪ So, 𝒑(𝒕) is a probability distribution over pages
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 18
i1 i2 i3
Where is the surfer at time t+1?
▪ Follow a link uniformly at random
j
𝒑 𝒕 + 𝟏 = 𝑴 ⋅ 𝒑(𝒕) p(t + 1) = M p(t )
Suppose the random walk reaches a state
𝒑 𝒕 + 𝟏 = 𝑴 ⋅ 𝒑(𝒕) = 𝒑(𝒕)
then 𝒑(𝑡) is stationary distribution of a random walk
Our original rank vector 𝒓 satisfies 𝒓 = 𝑴 ⋅ 𝒓
▪ So, 𝒓 is a stationary distribution for
the random walk
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 19
Recall from lecture 2 (eigenvector centrality), let
𝑨 ∈ 𝟎, 𝟏 𝒏×𝒏 be an adj. matrix of undir. graph:
4 0 1 0 1
3 1 0 0 1
A=
2 0 0 0 1
1 1 0
1 1
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 22
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
Given a graph with n nodes, we use an
iterative procedure:
Assign each node an initial page rank
Repeat until convergence (σ𝑖 𝑟𝑖𝑡+1 − 𝑟𝑖𝑡 < 𝜖)
▪ Calculate the page rank of each node
𝒅𝒊 …. out-degree of node 𝒊
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 24
Given a web graph with N nodes, where the
nodes are pages and edges are hyperlinks
Power iteration: a simple iterative scheme
▪ Initialize: 𝒓(0) = [1/𝑁, … . , 1/𝑁]𝑇
▪ Iterate: 𝒓(𝒕+𝟏) = 𝑴 ∙ 𝒓(𝑡)
▪ Stop when |𝒓 𝒕+𝟏 – 𝒓(𝑡) |1 < 𝑑𝑖 …. out-degree of node 𝑖
𝒙 1 = σ𝑁
1 |𝒙𝒊| is the L1 norm
Can use any other vector norm, e.g., Euclidean
▪ 3: go to 1
Example:
ry 1/3 1/3 5/12 9/24 6/15
ra = 1/3 3/6 1/3 11/24 … 6/15
rm 1/3 1/6 3/12 1/6 3/15
Iteration 0, 1, 2, …
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 26
y a m
Power Iteration: y
y ½ ½ 0
▪ Set 𝒓𝒋 ← 1/N a ½ 0 1
𝒓𝒊 a m m 0 ½ 0
▪ 1: 𝒓′𝒋 ← σ𝑖→𝑗
𝑑𝑖
𝒓𝒚 = 𝒓𝒚 /𝟐 + 𝒓𝒂 /𝟐
▪ 2: If |𝒓 − 𝒓’| > 𝜀: 𝒓𝒂 = 𝒓𝒚 /𝟐 + 𝒓𝒎
▪ 𝒓 ← 𝒓′ 𝒓𝒎 = 𝒓𝒂 /𝟐
▪ 3: go to 1
Example:
ry 1/3 1/3 5/12 9/24 6/15
ra = 1/3 3/6 1/3 11/24 … 6/15
rm 1/3 1/6 3/12 1/6 3/15
Iteration 0, 1, 2, …
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 27
or
equivalently
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 28
Two problems:
(1) Some pages are
dead ends (have no out-links)
▪ Such pages cause
importance to “leak out”
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 29
The “Spider trap” problem:
a b
Example:
Iteration: 0, 1, 2, 3…
ra 1 0 0 0
=
rb 0 1 1 1
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 30
The “Dead end” problem:
a b
Example:
Iteration: 0, 1, 2, 3…
ra 1 0 0 0
=
rb 0 1 0 0
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 31
Solution for spider traps: At each time step, the
random surfer has two options
▪ With prob. , follow a link at random
▪ With prob. 1- , jump to a random page
▪ Common values for are in the range 0.8 to 0.9
Surfer will teleport out of spider trap within a
few time steps
y y
a m a m
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 32
Teleports: Follow random teleport links with
total probability 1.0 from dead-ends
▪ Adjust matrix accordingly
y y
a m a m
y a m y a m
y ½ ½ 0 y ½ ½ ⅓
a ½ 0 0 a ½ 0 ⅓
m 0 ½ 0 m 0 ½ ⅓
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 33
Why are dead-ends and spider traps a problem
and why do teleports solve the problem?
Spider-traps are not a problem, but with traps
PageRank scores are not what we want
▪ Solution: Never get stuck in a spider trap by
teleporting out of it in a finite number of steps
Dead-ends are a problem
▪ The matrix is not column stochastic so our initial
assumptions are not met
▪ Solution: Make matrix column stochastic by always
teleporting when there is nowhere else to go
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 34
Google’s solution that does it all:
At each step, random surfer has two options:
▪ With probability , follow a link at random
▪ With probability 1- , jump to some random page
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 37
Image credit: Wikipedia
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 38
PageRank solves for 𝒓 = 𝑮𝒓 and can be
efficiently computed by power iteration of the
stochastic adjacency matrix (𝑮)
Adding random uniform teleportation solves
issues of dead-ends and spider-traps
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 39
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
Given:
A bipartite graph representing user and item
interactions (e.g. purchase)
Items
…
Users
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 41
Goal: Proximity on graphs
▪ What items should we recommend to a user who
interacts with item Q?
▪ Intuition: if items Q and P are interacted by similar
users, recommend P when user interacts with Q
Item Q
…
Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu
Which is more related A,A’ or B,B’?
Which is more related A,A’, B,B’ or C,C’?
Shortest path
Which is more related A,A’, B,B’ or C,C’?
Shortest path
Common Neighbors
Which is more related A,A’, B,B’ or C,C’?
D D’
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 47
Idea
▪ Every node has some importance
▪ Importance gets evenly split among all edges and
pushed to the neighbors:
Given a set of QUERY_NODES, we simulate a
random walk:
▪ Make a step to a random neighbor and record the visit
(visit count)
▪ With probability ALPHA, restart the walk at one of the
QUERY_NODES
▪ The nodes with the highest visit count have highest
proximity to the QUERY_NODES
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 48
Idea:
▪ Every node has some importance
▪ Importance gets evenly split among all edges and
Bipartite Pin and Board graph
pushed to the neighbors
Given a set of QUERY NODES Q, simulate a
random walk:
Q
10/6/2021 Jure Les kovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 49
Proximity to query node(s) Q:
Q item = QUERY_NODES.sample_by_weight( )
for i in range( N_STEPS ):
Bipartite Pin and Board graph user = item.get_random_neighbor( )
item = user.get_random_neighbor( )
item.visit_count += 1
if random( ) < ALPHA:
item = QUERY_NODES.sample.by_weight ( )
10/6/2021 Jure Les kovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 50
Pixie Random
Proximity Walk
to query node(s) Q:
item = QUERY_NODES.sample_by_weight( )
Q
for i in range( N_STEPS ):
user = item.get_random_neighbor( )
item = user.get_random_neighbor( )
item.visit_count += 1
Number of visits by if random( ) < ALPHA:
random walks starting at Q item = QUERY_NODES.sample.by_weight ( )
Query Item Q
5 5 5 5 5 5 14 9 Q 16 7 8 8 8 8 1 1 1
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 52
PageRank:
▪ Teleports to any node
▪ Nodes can have the same probability of the surfer landing:
𝑺 = [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]
Topic-Specific PageRank aka Personalized PageRank:
▪ Teleports to a specific set of nodes
▪ Nodes can have different probabilities of the surfer landing
there:
𝑺 = [0.1, 0, 0, 0.2, 0, 0, 0.5, 0, 0, 0.2]
Random Walk with Restarts:
▪ Topic-Specific PageRank where teleport is always to the same
node:
𝑺 = [0, 0, 0, 0, 𝟏, 0, 0, 0, 0, 0, 0]
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 54
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
Recall: encoder as an embedding lookup
embedding vector for a
embedding specific node
matrix
Dimension/size
of embeddings
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 56
Simplest node similarity: Nodes 𝑢, 𝑣 are
similar if they are connected by an edge
This means: 𝐳𝑣Τ 𝐳𝑢 = 𝐴𝑢,𝑣
which is the (𝑢, 𝑣) entry of the graph
adjacency matrix 𝐴
Therefore, 𝒁𝑇 𝒁 = 𝐴
𝒁𝑇 𝒁
4 0 1 0 1
3 1 0 0 1 ×
A=
0 0 0 1
2
1 1 0
1 1
𝐳𝑢 𝐳𝑣
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 57
The embedding dimension 𝑑 (number of rows in 𝒁)
is much smaller than number of nodes 𝑛.
Exact factorization 𝐴 = 𝒁𝑻 𝒁 is generally not possible
However, we can learn 𝒁 approximately
Objective:min ∥ A − 𝒁𝑇 𝒁 ∥2
𝐙
▪ We optimize 𝒁 such that it minimizes the L2 norm
(Frobenius norm) of A − 𝒁𝑇 𝒁
▪ Note in Lecture 3 we used softmax instead of L2. But the
goal to approximate A with 𝒁𝑇 𝒁 is the same.
Conclusion: Inner product decoder with node
similarity defined by edge connectivity is
equivalent to matrix factorization of 𝐴.
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 58
DeepWalk and node2vec have a more
complex node similarity definition based on
random walks
DeepWalk is equivalent to matrix
factorization of the following complex matrix
expression:
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 59
Volume of graph
𝑣𝑜𝑙 𝐺 = 𝐴𝑖,𝑗 Diagonal matrix 𝐷
𝑖 𝑗 𝐷𝑢,𝑢 = deg(𝑢)
1 𝑇
log 𝑣𝑜𝑙(𝐺) (𝐷 −1𝐴)𝑟 𝐷 −1 − log 𝑏
𝑇 𝑟=1
4
Feature vector
3 (e.g. protein properties in a
2 protein-protein interaction graph)
1 5
DeepWalk / node2vec
embeddings do not incorporate
such node features