04 Pagerank

CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University

http://cs224w.stanford.edu
ANNOUNCEMENTS
• Homework 1 will be released after class
• Next Thursday (10/07): Colab 1 due, Colab 2 out
o Do Colab 0! It has almost everything you need to
complete Colab 1.
• Office hours: we’ve added Zoom links to our OH
calendar.
o See http://web.stanford.edu/class/cs224w/oh.html for
OH calendar, Zoom links, and QueueStatus link.

In this lecture, we investigate graph analysis and
learning from a matrix perspective.
 Treating a graph as a matrix allows us to:
▪ Determine node importance via random walk (PageRank)
▪ Obtain node embeddings via matrix factorization (MF)
▪ View other node embeddings (e.g. Node2Vec) as MF
 Random walk, matrix factorization and node
embeddings are closely related!
4 0 1 0 1
 
3 1 0 0 1
A=
0 0 0 1
2  
1 0 
1  1 1
10/6/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 3
Q: What does the Web “look like” at
a global level?
 Web as a graph:
▪ Nodes = web pages
▪ Edges = hyperlinks
▪ Side issue: What is a node?

▪ Dynamic pages created on the fly
▪ “dark matter” – inaccessible
database generated pages
I teach a
class on
Networks. CS224W:
Classes are
in the
Gates
building Computer
Science
Department
at Stanford
Stanford
University
I teach a
class on
Networks. CS224W:
Classes are
in the
Gates
building Computer
Science
Department
at Stanford
Stanford
University
 In early days of the Web links were navigational

 Today many links are transactional (used not to navigate
from page to page, but to post, comment, like, buy, …)
Citations References in an Encyclopedia
 How is the Web linked?
 What is the “map” of the Web?
Web as a directed graph [Broder et al. 2000]:
▪ Given node v, what nodes can v reach?
▪ What other nodes can reach v?
E
F
B
D C G
 All web pages are not equally “important”
thispersondoesnotexist.com vs. www.stanford.edu
 There is large diversity

in the web-graph
node connectivity.
 So, let’s rank the pages
using the web graph
link structure!
 We will cover the following Link Analysis
approaches to compute the importance of
nodes in a graph:
▪ PageRank
▪ Personalized PageRank (PPR)
▪ Random Walk with Restarts
 Idea: Links as votes
▪ Page is more important if it has more links
▪ In-coming links? Out-going links?
 Think of in-links as votes:
▪ www.stanford.edu has 23,400 in-links
▪ thispersondoesnotexist.com has 1 in-link
 Are all in-links equal?
▪ Links from important pages count more
▪ Recursive question!
 A “vote” from an important
page is worth more:
i k
▪ Each link’s vote is proportional ri / 3
rk / 4
to the importance of its source j rj / 3
page rj / 3 rj / 3
▪ If page i with importance ri has
di out-links, each link gets ri / di
votes rj = ri/3 + rk/4
▪ Page j’s own importance rj is
the sum of the votes on its in-
links
 A page is important if it is The web in 1839
pointed to by other important ry/2

pages y
 Define “rank” rj for node j
ra/2
ry/2
rm
a m
ra/2
𝒅𝒊 … out-degree of node 𝒊 “Flow” equations:
ry = ry /2 + ra /2
ra = ry /2 + rm
You might wonder: Let’s just use Gaussian elimination rm = ra /2
to solve this system of linear equations. Bad idea!
i
 Stochastic adjacency matrix 𝑴
▪ 𝒅𝒊 is the outdegree of node 𝒊 j
▪ If , then
▪ 𝑴 is a column stochastic matrix
▪ Columns sum to 1 1/3
 Rank vector 𝒓: An entry per page M

▪ 𝒓𝒊 is the importance score of page 𝒊
▪ σ𝒊 𝒓𝒊 = 𝟏
 The flow equations can be written
ry ra rm
y ry ½ ½ 0
ra ½ 0 1
a m rm 0 ½ 0
ry = ry /2 + ra /2 ry ½ ½ 0 ry
ra = ry /2 + rm ra = ½ 0 1 ra
rm = ra /2 rm 0 ½ 0 rm
𝒓 𝑴 𝒓
10/6/2021 Jure Les kovec, Stanford C246: Mi ning Massive Datasets 17
i1 i2 i3
 Imagine a random web surfer:
▪ At any time 𝒕, surfer is on some page 𝑖
▪ At time 𝒕 + 𝟏, the surfer follows an j
out-link from 𝒊 uniformly at random
▪ Ends up on some page 𝒋 linked from 𝒊
▪ Process repeats indefinitely
 Let:
 𝒑(𝒕) … vector whose 𝑖 th coordinate is the
prob. that the surfer is at page 𝑖 at time 𝑡
▪ So, 𝒑(𝒕) is a probability distribution over pages
i1 i2 i3
 Where is the surfer at time t+1?
▪ Follow a link uniformly at random
j
𝒑 𝒕 + 𝟏 = 𝑴 ⋅ 𝒑(𝒕) p(t + 1) = M  p(t )
 Suppose the random walk reaches a state
𝒑 𝒕 + 𝟏 = 𝑴 ⋅ 𝒑(𝒕) = 𝒑(𝒕)
then 𝒑(𝑡) is stationary distribution of a random walk
 Our original rank vector 𝒓 satisfies 𝒓 = 𝑴 ⋅ 𝒓
▪ So, 𝒓 is a stationary distribution for
the random walk
 Recall from lecture 2 (eigenvector centrality), let
𝑨 ∈ 𝟎, 𝟏 𝒏×𝒏 be an adj. matrix of undir. graph:
4 0 1 0 1
 
3 1 0 0 1
A=
2 0 0 0 1
 
1 1 0 
 1 1
 Eigenvector of adjacency matrix:

vectors satisfying 𝜆𝒄 = 𝑨𝒄
 𝒄: eigenvector; 𝜆: eigenvalue
 Note:
▪ This is the definition of eigenvector centrality (for undirected graphs).
▪ PageRank is defined for directed graphs
ry ½ ½ 0 ry
 The flow equation:
ra = ½ 0 1 ra
1∙𝒓 = 𝑴 ∙ 𝒓 rm 0 ½ 0 rm
𝒓 𝑴 𝒓
 So the rank vector 𝒓 is an eigenvector of the
stochastic adj. matrix 𝑴 (with eigenvalue 1)
▪ Starting from any vector 𝒖, the limit 𝑴(𝑴(… 𝑴(𝑴 𝒖)))
is the long-term distribution of the surfers.
▪ PageRank = Limiting distribution = principal eigenvector of 𝑀
▪ Note: If 𝒓 is the limit of the product 𝑴𝑴 … 𝑴𝒖, then 𝒓 satisfies
the flow equation 1 ∙ 𝒓 = 𝑴𝒓
▪ So 𝒓 is the principal eigenvector of 𝑴 with eigenvalue 1
 We can now efficiently solve for r!
▪ The method is called Power iteration
 PageRank:
▪ Measures importance of nodes in a graph using
the link structure of the web
▪ Models a random web surfer using the stochastic
adjacency matrix 𝑴
▪ PageRank solves 𝒓 = 𝑴𝒓 where 𝒓 can be viewed
as both the principle eigenvector of 𝑴 and as the
stationary distribution of a random walk over the
graph
Given a graph with n nodes, we use an
iterative procedure:
 Assign each node an initial page rank
 Repeat until convergence (σ𝑖 𝑟𝑖𝑡+1 − 𝑟𝑖𝑡 < 𝜖)
▪ Calculate the page rank of each node
𝒅𝒊 …. out-degree of node 𝒊
 Given a web graph with N nodes, where the
nodes are pages and edges are hyperlinks
 Power iteration: a simple iterative scheme
▪ Initialize: 𝒓(0) = [1/𝑁, … . , 1/𝑁]𝑇
▪ Iterate: 𝒓(𝒕+𝟏) = 𝑴 ∙ 𝒓(𝑡)
▪ Stop when |𝒓 𝒕+𝟏 – 𝒓(𝑡) |1 <  𝑑𝑖 …. out-degree of node 𝑖
𝒙 1 = σ𝑁
1 |𝒙𝒊| is the L1 norm
Can use any other vector norm, e.g., Euclidean
About 50 iterations is sufficient to estimate the limiting solution.

y a m
 Power Iteration: y
y ½ ½ 0
▪ Set 𝒓𝒋 ← 1/N a ½ 0 1
𝒓𝒊 a m m 0 ½ 0
▪ 1: 𝒓′𝒋 ← σ𝑖→𝑗
𝑑𝑖
𝒓𝒚 = 𝒓𝒚 /𝟐 + 𝒓𝒂 /𝟐
▪ 2: If |𝒓 − 𝒓’| > 𝜀: 𝒓𝒂 = 𝒓𝒚 /𝟐 + 𝒓𝒎
▪ 𝒓 ← 𝒓′ 𝒓𝒎 = 𝒓𝒂 /𝟐
▪ 3: go to 1
 Example:
ry 1/3 1/3 5/12 9/24 6/15
ra = 1/3 3/6 1/3 11/24 … 6/15
rm 1/3 1/6 3/12 1/6 3/15
Iteration 0, 1, 2, …
y a m
 Power Iteration: y
y ½ ½ 0
▪ Set 𝒓𝒋 ← 1/N a ½ 0 1
𝒓𝒊 a m m 0 ½ 0
▪ 1: 𝒓′𝒋 ← σ𝑖→𝑗
𝑑𝑖
𝒓𝒚 = 𝒓𝒚 /𝟐 + 𝒓𝒂 /𝟐
▪ 2: If |𝒓 − 𝒓’| > 𝜀: 𝒓𝒂 = 𝒓𝒚 /𝟐 + 𝒓𝒎
▪ 𝒓 ← 𝒓′ 𝒓𝒎 = 𝒓𝒂 /𝟐
▪ 3: go to 1
 Example:
ry 1/3 1/3 5/12 9/24 6/15
ra = 1/3 3/6 1/3 11/24 … 6/15
rm 1/3 1/6 3/12 1/6 3/15
Iteration 0, 1, 2, …
or
equivalently
 Does this converge?

 Does it converge to what we want?
 Are results reasonable?
Two problems:
 (1) Some pages are
dead ends (have no out-links)
▪ Such pages cause
importance to “leak out”
 (2) Spider traps

(all out-links are within the group)
▪ Eventually spider traps absorb all importance
 The “Spider trap” problem:
a b
 Example:
Iteration: 0, 1, 2, 3…
ra 1 0 0 0
=
rb 0 1 1 1
 The “Dead end” problem:
a b
 Example:
Iteration: 0, 1, 2, 3…
ra 1 0 0 0
=
rb 0 1 0 0
 Solution for spider traps: At each time step, the
random surfer has two options
▪ With prob.  , follow a link at random
▪ With prob. 1- , jump to a random page
▪ Common values for  are in the range 0.8 to 0.9
 Surfer will teleport out of spider trap within a
few time steps
y y
a m a m
 Teleports: Follow random teleport links with
total probability 1.0 from dead-ends
▪ Adjust matrix accordingly
y y
a m a m
y a m y a m
y ½ ½ 0 y ½ ½ ⅓
a ½ 0 0 a ½ 0 ⅓
m 0 ½ 0 m 0 ½ ⅓
Why are dead-ends and spider traps a problem
and why do teleports solve the problem?
 Spider-traps are not a problem, but with traps
PageRank scores are not what we want
▪ Solution: Never get stuck in a spider trap by
teleporting out of it in a finite number of steps
 Dead-ends are a problem
▪ The matrix is not column stochastic so our initial
assumptions are not met
▪ Solution: Make matrix column stochastic by always
teleporting when there is nowhere else to go
 Google’s solution that does it all:
At each step, random surfer has two options:
▪ With probability  , follow a link at random
▪ With probability 1- , jump to some random page
 PageRank equation [Brin-Page, 98]

di … out-degree
of node i
This formulation assumes that 𝑴 has no dead ends. We can either

preprocess matrix 𝑴 to remove all dead ends or explicitly follow random
teleport links with probability 1.0 from dead-ends.
 PageRank equation [Brin-Page, ‘98]
 The Google Matrix G: [1/N]NxN…N by N matrix

where all entries are 1/N
 We have a recursive problem: 𝒓 = 𝑮 ⋅ 𝒓

And the Power method still works!
 What is  ?
▪ In practice  =0.8,0.9 (make 5 steps on avg., jump)
M [1/N]NxN
7/15
y 1/2 1/2 0 1/3 1/3 1/3
0.8 1/2 0 0 + 0.2 1/3 1/3 1/3
0 1/2 1 1/3 1/3 1/3
y 7/15 7/15 1/15

13/15
a 7/15 1/15 1/15
a m 1/15 7/15 13/15
m
G
y 1/3 0.33 0.24 0.26 7/33

a = 1/3 0.20 0.20 0.18 ... 5/33
m 1/3 0.46 0.52 0.56 21/33
Image credit: Wikipedia
 PageRank solves for 𝒓 = 𝑮𝒓 and can be
efficiently computed by power iteration of the
stochastic adjacency matrix (𝑮)
 Adding random uniform teleportation solves
issues of dead-ends and spider-traps
 Given:
A bipartite graph representing user and item
interactions (e.g. purchase)
Items
…
Users
 Goal: Proximity on graphs
▪ What items should we recommend to a user who
interacts with item Q?
▪ Intuition: if items Q and P are interacted by similar
users, recommend P when user interacts with Q
Item Q
…
Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu
 Which is more related A,A’ or B,B’?
 Which is more related A,A’, B,B’ or C,C’?
Shortest path
Shortest path
Common Neighbors
D D’
Personalized Page Rank/Random Walk with Restarts

 PageRank:
▪ Ranks nodes by “importance”
▪ Teleports with uniform probability to any node in
the network
 Personalized PageRank:
▪ Ranks proximity of nodes to the teleport nodes 𝑺
 Proximity on graphs:
▪ Q: What is most related item to Item Q?
▪ Random Walks with Restarts
▪ Teleport back to the starting node: 𝑺 = {𝑸}
 Idea
▪ Every node has some importance
▪ Importance gets evenly split among all edges and
pushed to the neighbors:
 Given a set of QUERY_NODES, we simulate a
random walk:
▪ Make a step to a random neighbor and record the visit
(visit count)
▪ With probability ALPHA, restart the walk at one of the
QUERY_NODES
▪ The nodes with the highest visit count have highest
proximity to the QUERY_NODES
 Idea:
▪ Every node has some importance
▪ Importance gets evenly split among all edges and
Bipartite Pin and Board graph
pushed to the neighbors
 Given a set of QUERY NODES Q, simulate a
random walk:
Q
10/6/2021 Jure Les kovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 49
 Proximity to query node(s) Q:
Q item = QUERY_NODES.sample_by_weight( )
for i in range( N_STEPS ):
Bipartite Pin and Board graph user = item.get_random_neighbor( )
item = user.get_random_neighbor( )
item.visit_count += 1
if random( ) < ALPHA:
item = QUERY_NODES.sample.by_weight ( )
Pixie Random
 Proximity Walk
to query node(s) Q:
item = QUERY_NODES.sample_by_weight( )
Q
for i in range( N_STEPS ):
user = item.get_random_neighbor( )
item = user.get_random_neighbor( )
item.visit_count += 1
Number of visits by if random( ) < ALPHA:
random walks starting at Q item = QUERY_NODES.sample.by_weight ( )
Query Item Q
5 5 5 5 5 5 14 9 Q 16 7 8 8 8 8 1 1 1
Yummm Strawberries Smoothies Smoothie Madness!•!•

User 1 User 2 User 3 User 4
 Why is this a good solution?
 Because the “similarity” considers:
▪ Multiple connections
▪ Multiple paths
▪ Direct and indirect connections
▪ Degree of the node
 PageRank:
▪ Teleports to any node
▪ Nodes can have the same probability of the surfer landing:
𝑺 = [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]
 Topic-Specific PageRank aka Personalized PageRank:
▪ Teleports to a specific set of nodes
▪ Nodes can have different probabilities of the surfer landing
there:
𝑺 = [0.1, 0, 0, 0.2, 0, 0, 0.5, 0, 0, 0.2]
 Random Walk with Restarts:
▪ Topic-Specific PageRank where teleport is always to the same
node:
𝑺 = [0, 0, 0, 0, 𝟏, 0, 0, 0, 0, 0, 0]

 A graph is naturally represented as a matrix
 We defined a random walk process over the
graph
▪ Random surfer moving across the links and with
random teleportation
▪ Stochastic adjacency matrix M
 PageRank = Limiting distribution of the surfer
location represented node importance
▪ Corresponds to the leading eigenvector of
transformed adjacency matrix M.
 Recall: encoder as an embedding lookup
embedding vector for a
embedding specific node
matrix
Dimension/size
of embeddings
one column per node

Objective: maximize 𝐳𝑣Τ 𝐳𝑢 for node pairs (𝑢, 𝑣) that are similar
 Simplest node similarity: Nodes 𝑢, 𝑣 are
similar if they are connected by an edge
 This means: 𝐳𝑣Τ 𝐳𝑢 = 𝐴𝑢,𝑣
which is the (𝑢, 𝑣) entry of the graph
adjacency matrix 𝐴
 Therefore, 𝒁𝑇 𝒁 = 𝐴
𝒁𝑇 𝒁
4 0 1 0 1
 
3 1 0 0 1 ×
A=
0 0 0 1
2  
1 1 0 
 1 1
𝐳𝑢 𝐳𝑣
 The embedding dimension 𝑑 (number of rows in 𝒁)
is much smaller than number of nodes 𝑛.
 Exact factorization 𝐴 = 𝒁𝑻 𝒁 is generally not possible
 However, we can learn 𝒁 approximately
 Objective:min ∥ A − 𝒁𝑇 𝒁 ∥2
𝐙
▪ We optimize 𝒁 such that it minimizes the L2 norm
(Frobenius norm) of A − 𝒁𝑇 𝒁
▪ Note in Lecture 3 we used softmax instead of L2. But the
goal to approximate A with 𝒁𝑇 𝒁 is the same.
 Conclusion: Inner product decoder with node
similarity defined by edge connectivity is
equivalent to matrix factorization of 𝐴.
 DeepWalk and node2vec have a more
complex node similarity definition based on
random walks
 DeepWalk is equivalent to matrix
factorization of the following complex matrix
expression:
▪ Explanation of this equation is on the next slide.

Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec, WSDM 18
Volume of graph
𝑣𝑜𝑙 𝐺 = ෍ ෍ 𝐴𝑖,𝑗 Diagonal matrix 𝐷
𝑖 𝑗 𝐷𝑢,𝑢 = deg(𝑢)
1 𝑇
log 𝑣𝑜𝑙(𝐺) ෍ (𝐷 −1𝐴)𝑟 𝐷 −1 − log 𝑏
𝑇 𝑟=1
context window size Number of

See Lec 3 slide 30: Power of normalized negative samples
𝑇 = |𝑁𝑅 𝑢 | adjacency matrix
 Node2vec can also be formulated as a matrix

factorization (albeit a more complex matrix)
 Refer to the paper for more details:
Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec, WSDM 18
Limitations of node embeddings via matrix
factorization and random walks
▪ Cannot obtain embeddings for nodes not in the
training set
4
3
2
1 5
Training set A newly added node 5 at test time

(e.g., new user in a social network)
Cannot compute its embedding

with DeepWalk / node2vec. Need to
recompute all node embeddings.
 Cannot capture structural similarity:
3 13
4 5 10
2 12
1 11
 Node 1 and 11 are structurally similar – part of
one triangle, degree 2, …
 However, they have very different embeddings.
▪ It’s unlikely that a random walk will reach
node 11 from node 1.
 DeepWalk and node2vec do not capture

structural similarity.
 Cannot utilize node, edge and graph features
4
Feature vector
3 (e.g. protein properties in a
2 protein-protein interaction graph)
1 5
DeepWalk / node2vec
embeddings do not incorporate
such node features
Solution to these limitations: Deep Representation

Learning and Graph Neural Networks
(To be covered in depth next week)
 PageRank
▪ Measures importance of nodes in graph
▪ Can be efficiently computed by power iteration of
adjacency matrix
 Personalized PageRank (PPR)
▪ Measures importance of nodes with respect to a
particular node or set of nodes
▪ Can be efficiently computed by random walk
 Node embeddings based on random walks can
be expressed as matrix factorization
 Viewing graphs as matrices plays a key role in all
above algorithms!

04 Pagerank

Uploaded by

Copyright:

Available Formats

04 Pagerank

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

04 Pagerank

Uploaded by

Copyright:

Available Formats

CS224W: Machine Learning with Graphs

Jure Leskovec, Stanford University

CS224W: Machine Learning with Graphs

▪ Side issue: What is a node?

 In early days of the Web links were navigational

 There is large diversity

pointed to by other important ry/2

 Rank vector 𝒓: An entry per page M

 Eigenvector of adjacency matrix:

About 50 iterations is sufficient to estimate the limiting solution.

 Does this converge?

 (2) Spider traps

 PageRank equation [Brin-Page, 98]

This formulation assumes that 𝑴 has no dead ends. We can either

 The Google Matrix G: [1/N]NxN…N by N matrix

 We have a recursive problem: 𝒓 = 𝑮 ⋅ 𝒓

y 7/15 7/15 1/15

y 1/3 0.33 0.24 0.26 7/33

Personalized Page Rank/Random Walk with Restarts

Yummm Strawberries Smoothies Smoothie Madness!•!•

10/6/2021 Jure Les kovec, Stanford C246: Mi ning Massive Datasets 53

one column per node

▪ Explanation of this equation is on the next slide.

context window size Number of

 Node2vec can also be formulated as a matrix

Training set A newly added node 5 at test time

Cannot compute its embedding

 DeepWalk and node2vec do not capture

Solution to these limitations: Deep Representation

You might also like