Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

5 - Cia 2 Key

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Key : INT404 Big Data Analytics CIA II Oct-2021

1.Compute PageRank vector of the web graph for the following:


Suppose you have a small internet consisting of four web sites www.page1.com,
www.page2.com, www.page3.com and www.page4.com. Page1.com links with page2.com,
page3.com and page4.com. Page2.com links with page3.com and page4.com. page3.com
links with page1.com. Page4.com links with page3.com and page1.com.
Construct a directed web graph with four nodes, one for each web site. Find a transition
matrix for the web graph and compute the PageRank vector for the above.

Let www.page1.com be 1
Let www.page2.com be 2
Let www.page3.com be 3
Let www.page4.com be 4
0 0 1 1 2
1 3 0 0 0 

1 3 1 2 0 1 2
 
The transition matrix M is 1 3 1 2 0 0
Initially the importance is uniformly distributed among the 4 nodes, each getting ¼. Denote by V
the initial rank vector having all entries equal to ¼.
The new importance vector V1=M V. We can iterate the process and stop whenever the
updated importance vector is V2=M(MV) = V3=M(M2V).
1 4  0.25
1 4  0.25
V  
1 4  0.25
   
1 4  0.25

 0 0 1 0.5 0.25 0.37 


0.33 0 0 0  0.25  0.08 
MV     
0.33 0.5 0 0.5 0.25  0.33
    
0.33 0.5 0 0  0.25  0.20 
 0 0 1 0.5 0.37   0.43
0.33 0 0 0   0.08   0.12 
M 2V  M ( MV )     
0.33 0.5 0 0.5  0.33 0.27 
    
0.33 0.5 0 0   0.20   0.16 

 0.43  0.35
 0.12  0.14 
M V  M (M V )  M 
3 2  
0.27  0.29 
   
 0.16  0.20 
 0.35 0.39 
0.14   0.11
M 4V  M ( M 3V )  M   
0.29  0.29 
   
0.20  0.19 
0.39  0.39 
 0.11  0.13
M V  M (M V )  M 
5 4  
0.29  0.29 
   
0.19  0.19 
0.39   0.38
 0.13  0.13
M 6V  M ( M 5V )  M   
0.29  0.29 
   
0.19  0.19 
 0.38  0.38
 0.13 0.12 
M V  M (M V )  M 
7 6  
0.29  0.29 
   
0.19  0.19 
 0.38  0.38
0.12  0.12 
M 8V  M ( M 7V )  M   
0.29  0.29 
   
0.19  0.19 

Since the last two vectors values are same, we can stop the process and the page ranks for the
four web sites are 0.38, 0.12, 0.29 and 0.19.
2. The given utility matrix M for user-item pair is

User/item a b c d e f g h
A 4 5 5 1 3 2
B 3 4 3 1 2 1
C 2 1 3 4 5 3
a)For a boolean utility matrix, compute the Jaccard distance between each pair of users.
The Jaccord distance between (A,B) = 0.5, (A,C) = 0.5, (B,C) = 0.5

(b)Compute the cosine distance between each pair of users.


Cosine of the angle between (A,B)=0.601, (A,C) = 0.615, (B,C) = 0.514

(c)Normalize the matrix by subtracting from each nonblank entry the average value for its user.
Compute the cosine distance between each pair of users.
User/item a b c d e f g h Average
A 4 5 5 1 3 2 20/6
B 3 4 3 1 2 1 14/6
C 2 1 3 4 5 3 18/6

The utility matrix after normalizing ratings:

User/item a b c d e f g h Average
A 4/6 10/6 10/6 -14/6 -2/6 -8/6 20/6
B 4/6 10/6 4/6 -8/6 -2/6 -8/6 14/6
C -6/6 -12/6 0 6/6 12/6 0 18/6
Calculations for the cosine of the angle between AB, AC, and BC.

AB= 0.584, AC= -VE, BC= -VE

(d)Recode the ratings (3,4,5) as 1 and (1,2 and blank) as 0. Compute the cosine distance
between each pair of users.
User/item a b c d e f g h Average
A 1 1 0 1 0 0 1 0 20/6
B 0 1 1 1 0 0 0 0 14/6
C 0 0 0 1 0 1 1 1 18/6
The Jaccord distance between

(A,B) = 1- 2/5 = 3/5 = 0.6

(A,C) = 1 – 2/6 = 0.66


(B,C) = 1- 1/6 = 5/6 = 0.83

Cosine Similarity among A,B, and C

Sim(A,B) = 0.5773

Sim(A,C) = 0.5

Sim(B,C) = 0.2886

You might also like