Hanjun Dai, PhD Student, School of Computational Science and Engineering, Georgia Tech at MLconf ATL 2017

Graph Representation Learning
with Deep Embedding Approach
Hanjun Dai
Ph.D. student in School of Computational Science & Engineering
Georgia Institute of Technology

Drug discovery
Dai et.al, structure2vec, ICML 2016
Recommendation
Dai et.al, DeepCoevolve, Recsys DLRS 2016
Knowledge Graph
Dai et.al, VRN, in submission
Trivedi et.al, Know-Evolve, ICML 2017
TSPMaxcut
Dai et.al, S2V-DQN, NIPS 2017
Vertex Cover
Graph applications

Outline
• Review of traditional approaches
• Our architecture
• Experiments on RNA and molecules
• Extension to social network and recommendation
• Application in graph combinatorial optimization

Review: RNA / Molecule property prediction
Application: High throughput virtual screening
C U U C A G
Structured data Target
Power Conversion Efficiency
(PCE): Regression problem
Handcrafted feature
Binding affinity:
Binary Classification Problem
,S={
}…… ,
C ,S={
}…… ,
U U U U C
C A G
0 2 0 0 … … … … 0 2
AA AC AU AG GGGU

Problem with handcrafted features
• Stage 1: Build kernel Matrix • Stage 2: Train classifier on top
Not scalable for millions of data
𝑘(𝜒2, 𝜒3)
Or High-dim Explicit Bag-of-words Feature Map
0 2 0 0 … … … … 0 2
AA AC AU AG GGGU Constructed features are
not aware of task
Review: RNA / Molecule property prediction

Review: Temporal Recommendation
who will do what and when?
ChristineAliceDavid Jacob
TowelShoe Book
𝑅user
item
≈
Matrix factorization
𝑈
𝑉
Epoch division
𝑡

Minimum vertex/set cover
Advertisers: influence maximization
Review: Graph Combinatorial Optimization
2 - approximation for
minimum vertex cover
Repeat till all edges covered:
• Select uncovered edge with
largest total degree
Manually
designed rule.
Can we learn
from data?
NP-hard problems

Intuitive understanding: local filters
Filter applies to each local patch:
Image Graph
Filter applies to each 1-hop neighborhood:

𝜇2
(1)
𝜇3
(0)
𝜇4
(0)
𝜇1
(0)
𝜒
𝑋1
𝑋2
𝑋3
𝐻3
𝐻1
𝐻2
𝐻4
𝑋4
𝜇2
(0)
𝜇3
(1)
𝜇4
(1)
𝜇1
(1)
End to end learning
𝜇2
(1)
𝜇3
(1)
𝜇1
(1)
……
𝜇2
(𝑇)
𝜇3
(𝑇)
𝜇1
(𝑇)
……
……
Iteration 1:
Iteration 𝑇:
Label 𝑦
classification/regression
withparameter𝑉
Aggregate
𝜇1
(𝑇)
𝜇2
(𝑇)
+
+
⋮
= 𝜇 𝑎(𝑊, 𝜒)

Experiment on RNA sequences
0
0.02
0.04
0.06
0.08
0.1
0.12
Ace2 Aft1 Aft2 Bas1 Cad1 Cbf1 Cin5 Cup9 Dal80 Gat1 Gcn4 Mata2 Mcm1
PWM
LM
SVR
DNN
CNN
FS
S2V
0
0.02
0.04
0.06
0.08
0.1
0.12
Met31 Met32 Msn1 Msn2 Nrg2 Pdr3 Pho4 Reb1 Rox1 Rpn4 Sko1 Stb5 Yap1 Yap3 Yap7
PWM
LM
SVR
DNN
CNN
FS
S2V
• MITOMI 2.0
• 28 TFs
• Regression problem
MAE
MAE
14
S2V Cmp
RMSE 0.035 0.039
PCC 0.62 0.45
SCC 0.29 0.26
Dai et.al, Bioinformatics 2017

Experiment on Molecules
Dataset Harvard clean energy project
Size 2.3 million
Avg node # 28
Avg edge # 33
Power Conversion Efficiency (PCE)
(0 -12 %)
predict
Organic
Solar Panel
Materials
Test MAE Test RMSE # parameters
Mean
predictor
1.986 2.406 1
WL level-3 0.143 0.204 1.6 m
WL level-6 0.096 0.137 1378 m
DE-MF 0.091 0.125 0.1 m
DE-LBP 0.085 0.117 0.1 m

Experiment on Molecules
0.1M 1M 10M 100M 1000M
0.085
0.095
0.120
0.150
0.280
Parameter
number
MAE
Embedded
MF
Embedded
BP
Weisfeiler-Lehman
Level 6
Hashed
WL Level 6
Embedding
reduces model
size by 10,000x !
[Dai, Dai & Song 2016]

Interpretable results
Effective ( > 10 ) Ineffective ( < 0.5 )
• Sequence Motifs • Molecule fragments

Extension to network analysis
ℒ =
𝑣∈𝒱
𝑙𝑜𝑠𝑠(𝑦𝑣, 𝑦(𝜇 𝑣))
• One loss term per node

Experiment on network analysis
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
Citeseer Cora Pubmed
Classification Accuracy on Citation
Networks
DeepWalk node2vec structure2vec
Citeseer Cora Pubmed
# nodes 3,327 2,708 19,717
# edges 4,732 5,429 44338
# classes 6 7 3
Label Rate 0.036 0.052 0.003
Features 3,703 1,433 500
Document classification
in citation network

Experiment on network analysis
15
20
25
30
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Blogcatalog
5
9
13
17
21
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9-Fraction of training-
Wikipedia
deepwalk
node2vec
gcn
s2v
Blogcatalog Wikipedia
# nodes 10,312 4,777
# edges 333,983 184,812
# classes 39 40
task Group membership POS tag

Represent
𝑋1 𝑋2 𝑋3
𝑋4
𝑋5
𝐻8 𝐻9
𝐻6 𝐻7
𝐻4
𝐻1
𝐻5
𝐻2 𝐻3
𝑋6
LVM
𝐺 = (𝒱, ℇ)
user/item
raw features
Interaction
time/context
time
𝑡0
𝑡2
𝑡1
𝑡3
[Dai, et al. 2016]
Dynamic Graphs for Recommendation
• Unroll the interaction
along timeline
1. The bipartite
interaction graph
2. The temporal
ordering of events
• Mini-batch training
using truncated back
propagation through
time (BPTT)

Experiment on recommendation
Favors ’recurrent’ events: watching TV programs; discussion on Forums; visiting restaurants

Learning graph opt: Motivation
Minimum vertex/set cover
• Exponential time complexity
Branch and Bound
• Results too weak
Constructive Approximation
Cannot learn from solved instances!

Learning graph opt: Motivation
• Data from same / similar distribution
Social network: Barabási–Albert
Road network: fixed graph with
evolving edge weights
Images taken from Wikipedia
• Supervised learning?
• Reinforcement learning!
• No such supervision
• Learning by trial and error

Learning graph opt: RL background
[Minh, et al. Nature 2015]
Greedy policy:
𝑖∗ = 𝑎𝑟𝑔𝑚𝑎𝑥𝑖 𝑄(𝑆, 𝑖)
• State 𝑺: current screen
• Reward 𝑹(𝒕): score you earned at current step
• Action value function 𝑸(𝑺, 𝒊):
your predicted future total rewards
• Action 𝒊: move your board left / right
• Policy 𝝅(𝒔): How to choose your action

Learning graph opt: RL on graphs
[Dai, et al. NIPS 2017]
min
𝑥 𝑖∈ 0,1
𝑖∈𝓥
𝑥𝑖
𝑠. 𝑡. 𝑥𝑖 + 𝑥𝑗 ≥ 1, ∀ 𝑖, 𝑗 ∈ 𝓔
Reward: 𝑟 𝑡
= −1
• State 𝑺:
current partial solution
• Action value function 𝑸(𝑺, 𝒊):
Expected negative future loss
• Greedy Policy 𝝅(𝒔):
Add best node

Learning graph opt: action-value function
• Parameterize Action value function 𝑄(𝑆, 𝑖) with structure2vec
• Train with Deep Q-Learning (DQN)
Bellman optimality equation

Learning graph opt: quantitative comparison
approximation ratio ≈ 1
• A distribution of
scale free networks
• Optimal approximated by
running CPLEX for 1 hour

1
1.001
1.002
1.003
1.004
1.005
1.006
1.007
Generalization to large instances
Learning graph opt: quantitative comparison
• Train on small graphs with
50-100 nodes
• Generalize to not only graphs
from same distribution
• But also larger graphs
• Approximation ratio < 1.007

Learning graph opt: time-solution tradeoff
Embedded
MF
CPLEX
1st
CPLEX
2nd
CPLEX
3rd
CPLEX
4th
2-approx
2-approx +
Embedding
produces algorithm
with good tradeoff !
RNN
• Generate 200 Barabasi-
Albert networks with
300 nodes
• Let CPLEX produces 1st,
2nd, 3rd, 4th feasible
solutions

Learning graph opt: real-world data
http://snap.stanford.edu/netinf/#data
MemeTracker graph: 960 nodes and 5000 edges
Methods Approximation Ratio
Optimal 1.00 ( 473 nodes)
S2V-DQN 1.002 ( 474 nodes)
MVCApprox-Greedy 1.222 (578 nodes)
MVCApprox 1.408 (666 nodes)
• Learning from sampled
subgraphs

Learning graph opt: learned strategy
• Learned a greedy algorithm which is different from known ones

Learning graph opt: other problems
• Maximum cut
• Traveling Salesman Problem
optimal
Solution
found
• Set cover
Set cover image taken from Wikipedia
Learn with
Bipartite
graph

Thanks to my collaborators in this project
• Advisor:
Le Song
• Collaborators (alpha-beta order)
Bo Dai Bistra Dilkina Elias Khalil
Rakshit Trevidi Yichen Wang Yuyu Zhang

Hanjun Dai, PhD Student, School of Computational Science and Engineering, Georgia Tech at MLconf ATL 2017

More Related Content

Hanjun Dai, PhD Student, School of Computational Science and Engineering, Georgia Tech at MLconf ATL 2017

Editor's Notes