Exercise 9: Pagerank - Solution: 1 Problem 1
Exercise 9: Pagerank - Solution: 1 Problem 1
Exercise 9: Pagerank - Solution: 1 Problem 1
1 Problem 1
The goal is to find how many pages of the k pages we need to add and how we
should interlink them to maximize the PageRank of rolexx.com. Lets assume
that we have some pages already interlinked by the spammer and see what
happens if the random surfer is walking around. The random surfer can be
modelled as having three states:
Since the crawler has no memory, we can model this as a three-state Markov
chain (Figure 1.
Maximizing the PageRank of rolexx.com is equivalent to maximizing the
probability that rolexx.com is visited i.e. the probability of being in state R
(pR ). We have pR = pK pKR + pO pOR + pR pRR , which transforms to pR =
pK pKR +pO pOR
1pRR and this is the expression that we want to maximize. 1 pRR is
a constant so we ignore it and only need to maximize pR = pK pKR + pO pOR
Case a.) To maximize pKR we need to link from all the l pages to rolexx.com
and put no links among the l pages (pKK minimized) and from the l pages to
other pages (pKO minimized). We also set l = k to maximize pK . If we dont put
any links from rolexx.com to anywhere then it becomes a sink and the random
pOO pRR
pRO
pOR
pKO pKR
pOK pRK
pKK
1
N legitimate pages
rolexx.com
N legitimate pages
rolexx.com
walker has no link to take next and will always select an arbitrary page and
jump to it. To prevent that we link from rolexx.com to one of the k pages. The
final interlinking that maximizes the PageRank of rolexx.com in the a.) case is
on Figure 2.
Case b.) We keep the link structure from a.) but now we can additionally
control the pOR . To maximize this probability we link from all the legitimate
websites to rolexx.com. We can also increase the chances of visiting rolexx.com
by linking via one of the k websites, then rolexx.com can be reached in two hops
by the surfer. We create a link from all legitimate websites to the k websites.
The final interlinking is on Figure 3.
The spamming in case a.) is feasible to do in the context of Google, the
spammer can easily create an arbitrary number of new websites and link between
them. In case b.) it is a bit hard to convince the whole World Wide Web to link
to the k + 1 pages. Please have a look at the Wikipedia entry for link farms
for a discussion.
2
2 Problem 2
Lets assume that the weight of the link from the page pi to pj is wij . The weight
equals zero if there is no link. The only modification that needs to be made is
wij
to set the values of the transition matrix Rij = k=1..n wik which is simply the
weights normalized by their sum to have transition probabilities proportional
to the weights.