Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

05 Networks

Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

Networks

COMP7507
Visualization & Visual Analytics
Network Size
• Size of the WWW: ~60 billion pages
(http://www.worldwidewebsize.com/ )

[ Opte project]

2
Network size
• Facebook: 2.23 billion monthly active users
• Twitter: 330 million monthly active users
• Weibo: 360 million monthly active users
• QQ: 861 million monthly active users
(Data from:http://expandedramblings.com/index.php/resource-how-many-people-use-the-top-social-media/
updated August 2018)

3
[Paul Butler, http://paulbutler.org/archives/visualizing-facebook-friends/ ]
Graph Drawing
• Direct calculation based on graph structure
– Spanning tree
– Adjacency matrix layout
• Optimization-based
– Optimizing the graph aesthetic constraints
– Force-directed layout

4
Spanning Tree Layout
• Many graphs have tree-like structure or useful
spanning trees (i.e., trees that include all
vertices but only some edges of the original
graph)
– WWW, Social Networks

Graph Minimum Spanning Tree


[wikipedia] 5
Spanning Tree Layout
• To extract a spanning tree from a graph and
visualize the tree (which is efficient)
• Drawing of a graph is in general non-
deterministic, and a spanning tree layout offers
predictability
• Spanning Tree can be obtained by
– Breadth-First Search (BFS) / Depth-First Search (DFS)
– Min/max spanning tree

6
Adjacency Matrices

Les Misérables Co-occurrence

[ http://bost.ocks.org/mike/miserables/ ]

7
Node-Link Layout
• Severe edge crossings and cluttering

8
Optimization Techniques
• Formulate the layout problem as an optimization
problem
• Governed by an equation incorporating
– the costs of items (e.g., distance between nodes) to
be optimized; and
– some constraints (e.g., no edge crossings allowed)
• Non-deterministic and therefore resulting layout
is unpredictable
• Force-Directed Layout is commonly used for
undirected graph

9
Force Directed Layout

[ http://philogb.github.io/jit/static/v20/Jit/Examples/ForceDirected/example1.html ]

10
Force Directed Layout
• To model nodes and edges of a graph as physical
bodies tied with springs
• Two principles:
– Vertices connected by an edge should be drawn near
each other.
– Vertices should not be drawn too close to each other.
• Edges = springs (attractive force)
– Hooke’s Law: F = k * x x: length of spring

• Nodes = charged particles (repulsive force)


– Coulomb’s Law: F = − k * q1 * q2 / x2
k: constant; x: distance between two particles; q1,q2: charges
11
Force Directed Layout
• Initialize node positions (e.g., random placement)
• Do this iteratively until stabilized:
1. Calculate forces for nodes
2. Update node positions

• Different physical models lead to layouts of varying


quality 12
Fruchterman-Reingold Algorithm
• A force directed layout algorithm
• Calculate an optimal distance between nodes,
δ = C * sqrt( area / |V| ),
where C is a constant
• Attractive force
fa(x) = d2 / δ
• Repulsive force
fr(x) = − δ2 / d
d: distance between two nodes

Fruchterman, T. M. J., & Reingold, E. M. (1991). Graph Drawing by Force-


Directed Placement. Software: Practice and Experience, 21(11). 13
Filtering
• To reduce the number of visible elements in a
graph being viewed
– Improves clarity
– Increases performance of layout and rendering
• Removing some “redundant” edges to better
reveal network topology, and to facilitate the
clustering process

14
Filtering
• E.g., Edges linking you and your friends in your
Facebook network
• Ego network

[Hansen 2011] 15
Filtering
Results of Fruchterman-Reingold layout on
the 2007 US Senate voting data with and without edge filtering.

Without filtering Edges corresponding to


% agreement < 0.65 are filtered
Node color represents party affiliation of a senator

16
Clustering
• Structure-based clustering
– Use only structural information (e.g., edge
connections) of a graph
• Content-based clustering
– Use semantic data associated with graph elements
– Application specific

17
Clustering
• It is natural to choose the clustering with the least
number of edges between members
– or with the minimum total weight of the edges
connecting members for graphs with weighted edges
• Force directed layout algorithms
can also form clusters naturally

18
Edge Bundling
• Clustering of edges instead of nodes
• To reduce cluttering

• Examples:
– Geometry-Based Edge Clustering Structure-based

[Cui et al., "Geometry-Based Edge Clustering for Graph Visualization," TVCG, 2008]

– Hierarchical Edge Bundles


19
Hierarchical Edge Bundles
• Some data sets are with both non-hierarchical
(adjacency) and hierarchical relations.
– Source codes for a software:
• Adjacency: dependencies of classes
• Hierarchical: directories -> files -> classes
– Social networks:
• Adjacency : if people are acquainted and nature of relations
• Hierarchical: circles or groups of people -> individuals
– Citation networks:
• Adjacency: citations among publications
• Hierarchical: institutions -> departments -> publications

20
Hierarchical Edge Bundles
• Visualization examples without edge bundling

Colored edges representing adjacency relations on (left) balloon trees, and (right) tree maps.

[Holten, "Hierarchical Edge Bundles: Visualization of Adjacency Relations in Hierarchical Data," TVCG , 2006]
21
Hierarchical Edge Bundles
• Bundle adjacency edges along tree hierarchy
1. Tree drawing for hierarchical relations
2. Add adjacency edge (P0, P4) to graph

Find path that goes through least common ancestor (LCA) Draw curved adjacency edge that ”aligns" to tree
22
Hierarchical Edge Bundles
• Examples with hierarchical edge bundling

23
Hierarchical Edge Bundles

Tree structure is used only for


[http://mbostock.github.io/d3/talk/20111116/bundle.html] edge clustering and is not shown
24
Understanding a Network
• Visualization alone cannot provide full
understanding of a graph or network
• Network graph metrics are quantitative
measures for describing a network,
characterizing a subgroup or some specific nodes
within a network
– Influential people in a social network (e.g., celebrities
in Twitter or Weibo)
– Gatekeepers connecting communities (e.g., for
headhunting in LinkedIn)

25
Understanding a Network
• Studying network metrics over time helps
understand how a network evolves
• Network metrics help improve the visual display
of a network
– By assigning different visual attributes to the nodes
– By filtering and showing only the important nodes

26
Network Metrics
• Overall graph metrics
– Graph type; # of vertices; # of edges
– Self loops (e.g., a person replying his own emails)
– Connected components
– Isolated vertices
– Maximum geodesic distance (aka diameter)
• i.e., the distance between two nodes that are farthest apart
– Average geodesic distance
• i.e., average distance from one node to another through the
graph edges
– Graph density
• i.e., # edges ÷ # of possible edges
– etc.
For undirected graph, graph density = (2 * |E|) / (|V| * (|V|-1))

For directed graph, graph density = |E| / (|V| * (|V|-1))


27
Overall Graph Metrics

• Max. geodesic dist.


= 4
• Avg. geodesic dist.
= 1.98
• Graph density
= 18/45 = 0.4

28
Network Metrics
• Node metrics: structure-based metrics associated with a
node
– Degree (in-degree, out-degree)
– Betweenness centrality
– Closeness centrality
– Clustering coefficient
– Eigenvector centrality
– PageRank
• Use for identifying special or important nodes or
subgroups
• There are edge metrics as well (e.g., edge betweenness)

29
Degree Centrality
• A node is considered more important if it is of a
greater degree
• Centrality means
“Importance”

30
Betweenness Centrality
• Measures the importance of a person in passing
information within a network

Who is more important in terms


of betweenness, Ed or Heather?

Intuitively, Heather is of higher


betweenness centrality than Ed
because removing Heather results in
isolated nodes

31
Betweenness Centrality
• The betweenness centrality of a node:
# of shortest path
between s and t passing through v

Total # of shortest path


between s and t

• Equivalent to computing the all-pairs shortest


path of a graph ― Complexity: O(|V|3)

O(|V||E|) on unweighted sparse graph


Ulrik Brandes, “A Faster Algorithm for Betweenness Centrality ”,
Journal of Mathematical Sociology, 2001.

32
Betweenness Centrality

Vertex size representing betweenness

33
Closeness Centrality
• Measures how close a person is to the others
– how fast a message can reach all others from a
person
• What is the scenario for a person’s message to
reach all others in the fastest way?
• The closeness centrality of a node:

Farness of v:
Total distance between
v and all other nodes

34
Closeness Centrality

Fernando and Garth are of


the highest closeness measures

Vertex size representing closeness

35
Clustering Coefficient
• Measures how well a person’s friends are
connected to each other

Ed’s three friends know


each other

36
Clustering Coefficient
• Clustering coefficient of a node:

# edges connecting u’s neighbours


C(u) =
total # possible edges connecting u’s neighbours

If u has k neighbours, this equals # of edges


in a k-vertex clique (complete graph)
= k (k – 1) / 2 for an undirected graph
= k (k – 1) for a directed graph
• Ranged in [0,1]
– C(u) = 0: no edges among neighbours
– C(u) = 1: neighbours form a clique
37
Clustering Coefficient

Vertex size representing


clustering coefficient

38
Eigenvector Centrality
• Measures the influence of a node in a network
– having a connection to an important person is more
important
• deg(Healther)
= deg(Ed) = 3
• Ed connects with Diane
who is the most popular
(i.e., having the largest
degree)
• Heather connects to Ike,
who is among the least
popular
• Hence, Ed’s eigenvector
centrality is higher

39
Eigenvector Centrality
• The eigenvector centrality score of a node is:

: constant = 1 iff u, v are neighbours,


Sum of scores over all u's neighbours otherwise =0

• Written in matrix form gives:


: adjacency matrix

Since it is required that all scores (i.e., entries of x) be positive,


the eigenvector corresponding to the largest eigenvalue will give us the scores.
40
Eigenvector Centrality

Vertex size representing


eigenvector centrality

41
PageRank
• Used by Google search engine to rank websites
in their search results.
• Idea: More important websites are likely to
receive more links from other websites.
• A variant of eigenvector centrality
• Score of a page = the probability of being
brought to a page after many clicks.

42
PageRank

• The percentage shows


the likelihood that a
page can be reached
after many clicks.
• Assumption: A user
start on a random page,
and has 85% chance of
clicking randomly on
any link in the page,
and 15% chance of
http://en.wikipedia.org/wiki/PageRank jumping randomly to
any other page in the
WWW. 43
PageRank
• An assumption that there is possibility a surfer
will stop following links and jump instead to a
random page

: probability of following a link : total number of nodes


: probability of jumping to a random page : out degree of

• Can be used for ranking nodes in a general


network.
44
Visual Attributes Mapping
• Map ordered visual attributes to ordered data
and unordered visual attributes to categorical
data
• Ordered data: Node degree, centrality
– Ordered visual attributes: Size, line width, opacity
• Categorical data: Gender, affiliation
– Unordered visual attributes: Color, shape

45
Visual Attributes Mapping

[Hansen 2011]

shape gender size betweenness


color cluster opacity eigenvector centrality

46
Tools and Datasets
• Tools for Graph Visualization
– Gephi
https://gephi.org
– NodeXL
http://nodexl.codeplex.com

• Large Network Datasets


– A Citation Network Dataset
http://arnetminer.org/citation
– Stanford Network Analysis Project
http://snap.stanford.edu

47
Reference
• Ivan Herman, Guy Melançon, M. Scott Marshall, “Graph
Visualization and Navigation in Information
Visualization: A Survey”, IEEE Trans. Vis. Comput. Graph,
6 (1), 2000, pp. 24-43.
• Matthew Ward, Georges Grinstein and Daniel Keim,
"Interactive Data Visualization: Foundations, Techniques,
and Applications", 2010 [Chapter 8]
• Hansen, Shneiderman and Smith, “Analyzing Social
Media Networks with NodeXL: Insights from a
Connected World”, 2011.
• Isabel F. Cruz and Roberto Tamassia, “Graph Drawing
Tutorial” (http://cs.brown.edu/~rt/papers/gd-tutorial/gd-
constraints.pdf)

48

You might also like