Hanjun Dai, PhD Student, School of Computational Science and Engineering, Georgia Tech at MLconf ATL 2017
•Download as PPTX, PDF•
3 likes•756 views
Graph Representation Learning with Deep Embedding Approach:
Graphs are commonly used data structure for representing the real-world relationships, e.g., molecular structure, knowledge graphs, social and communication networks. The effective encoding of graphical information is essential to the success of such applications. In this talk I’ll first describe a general deep learning framework, namely structure2vec, for end to end graph feature representation learning. Then I’ll present the direct application of this model on graph problems on different scales, including community detection and molecule graph classification/regression. We then extend the embedding idea to temporal evolving user-product interaction graph for recommendation. Finally I’ll present our latest work on leveraging the reinforcement learning technique for graph combinatorial optimization, including vertex cover problem for social influence maximization and traveling salesman problem for scheduling management.
1 of 35
More Related Content
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Georgia Tech at MLconf ATL 2017
1. Graph Representation Learning
with Deep Embedding Approach
Hanjun Dai
Ph.D. student in School of Computational Science & Engineering
Georgia Institute of Technology
2. Drug discovery
Dai et.al, structure2vec, ICML 2016
Recommendation
Dai et.al, DeepCoevolve, Recsys DLRS 2016
Knowledge Graph
Dai et.al, VRN, in submission
Trivedi et.al, Know-Evolve, ICML 2017
TSPMaxcut
Dai et.al, S2V-DQN, NIPS 2017
Vertex Cover
Graph applications
3. Outline
• Review of traditional approaches
• Our architecture
• Experiments on RNA and molecules
• Extension to social network and recommendation
• Application in graph combinatorial optimization
4. Review: RNA / Molecule property prediction
Application: High throughput virtual screening
C U U C A G
Structured data Target
Power Conversion Efficiency
(PCE): Regression problem
Handcrafted feature
Binding affinity:
Binary Classification Problem
,S={
}…… ,
C ,S={
}…… ,
U U U U C
C A G
0 2 0 0 … … … … 0 2
AA AC AU AG GGGU
5. Problem with handcrafted features
• Stage 1: Build kernel Matrix • Stage 2: Train classifier on top
Not scalable for millions of data
𝑘(𝜒2, 𝜒3)
Or High-dim Explicit Bag-of-words Feature Map
0 2 0 0 … … … … 0 2
AA AC AU AG GGGU Constructed features are
not aware of task
Review: RNA / Molecule property prediction
6. Review: Temporal Recommendation
who will do what and when?
ChristineAliceDavid Jacob
TowelShoe Book
𝑅user
item
≈
Matrix factorization
𝑈
𝑉
Epoch division
𝑡
7. Minimum vertex/set cover
Advertisers: influence maximization
Review: Graph Combinatorial Optimization
2 - approximation for
minimum vertex cover
Repeat till all edges covered:
• Select uncovered edge with
largest total degree
Manually
designed rule.
Can we learn
from data?
NP-hard problems
8. Outline
• Review of traditional approaches
• Our architecture
• Experiments on RNA and molecules
• Extension to social network and recommendation
• Application in graph combinatorial optimization
9. Intuitive understanding: local filters
Filter applies to each local patch:
Image Graph
Filter applies to each 1-hop neighborhood:
11. Outline
• Review of traditional approaches
• Our architecture
• Experiments on RNA and molecules
• Extension to social network and recommendation
• Application in graph combinatorial optimization
13. Experiment on Molecules
Dataset Harvard clean energy project
Size 2.3 million
Avg node # 28
Avg edge # 33
Power Conversion Efficiency (PCE)
(0 -12 %)
predict
Organic
Solar Panel
Materials
Test MAE Test RMSE # parameters
Mean
predictor
1.986 2.406 1
WL level-3 0.143 0.204 1.6 m
WL level-6 0.096 0.137 1378 m
DE-MF 0.091 0.125 0.1 m
DE-LBP 0.085 0.117 0.1 m
14. Experiment on Molecules
0.1M 1M 10M 100M 1000M
0.085
0.095
0.120
0.150
0.280
Parameter
number
MAE
Embedded
MF
Embedded
BP
Weisfeiler-Lehman
Level 6
Hashed
WL Level 6
Embedding
reduces model
size by 10,000x !
[Dai, Dai & Song 2016]
16. Outline
• Review of traditional approaches
• Our architecture
• Experiments on RNA and molecules
• Extension to social network and recommendation
• Application in graph combinatorial optimization
19. Experiment on network analysis
15
20
25
30
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Blogcatalog
5
9
13
17
21
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9-Fraction of training-
Wikipedia
deepwalk
node2vec
gcn
s2v
Blogcatalog Wikipedia
# nodes 10,312 4,777
# edges 333,983 184,812
# classes 39 40
task Group membership POS tag
20. Represent
𝑋1 𝑋2 𝑋3
𝑋4
𝑋5
𝐻8 𝐻9
𝐻6 𝐻7
𝐻4
𝐻1
𝐻5
𝐻2 𝐻3
𝑋6
LVM
𝐺 = (𝒱, ℇ)
user/item
raw features
Interaction
time/context
time
𝑡0
𝑡2
𝑡1
𝑡3
[Dai, et al. 2016]
Dynamic Graphs for Recommendation
• Unroll the interaction
along timeline
1. The bipartite
interaction graph
2. The temporal
ordering of events
• Mini-batch training
using truncated back
propagation through
time (BPTT)
22. Outline
• Review of traditional approaches
• Our architecture
• Experiments on RNA and molecules
• Extension to social network and recommendation
• Application in graph combinatorial optimization
23. Learning graph opt: Motivation
Minimum vertex/set cover
• Exponential time complexity
Branch and Bound
• Results too weak
Constructive Approximation
Cannot learn from solved instances!
24. Learning graph opt: Motivation
• Data from same / similar distribution
Social network: Barabási–Albert
Road network: fixed graph with
evolving edge weights
Images taken from Wikipedia
• Supervised learning?
• Reinforcement learning!
• No such supervision
• Learning by trial and error
25. Learning graph opt: RL background
[Minh, et al. Nature 2015]
Greedy policy:
𝑖∗ = 𝑎𝑟𝑔𝑚𝑎𝑥𝑖 𝑄(𝑆, 𝑖)
• State 𝑺: current screen
• Reward 𝑹(𝒕): score you earned at current step
• Action value function 𝑸(𝑺, 𝒊):
your predicted future total rewards
• Action 𝒊: move your board left / right
• Policy 𝝅(𝒔): How to choose your action
26. Learning graph opt: RL on graphs
[Dai, et al. NIPS 2017]
min
𝑥 𝑖∈ 0,1
𝑖∈𝓥
𝑥𝑖
𝑠. 𝑡. 𝑥𝑖 + 𝑥𝑗 ≥ 1, ∀ 𝑖, 𝑗 ∈ 𝓔
Reward: 𝑟 𝑡
= −1
• State 𝑺:
current partial solution
• Action value function 𝑸(𝑺, 𝒊):
Expected negative future loss
• Greedy Policy 𝝅(𝒔):
Add best node
27. Learning graph opt: action-value function
• Parameterize Action value function 𝑄(𝑆, 𝑖) with structure2vec
• Train with Deep Q-Learning (DQN)
Bellman optimality equation
28. Learning graph opt: quantitative comparison
approximation ratio ≈ 1
• A distribution of
scale free networks
• Optimal approximated by
running CPLEX for 1 hour
29. 1
1.001
1.002
1.003
1.004
1.005
1.006
1.007
Generalization to large instances
Learning graph opt: quantitative comparison
• Train on small graphs with
50-100 nodes
• Generalize to not only graphs
from same distribution
• But also larger graphs
• Approximation ratio < 1.007
30. Learning graph opt: time-solution tradeoff
Embedded
MF
CPLEX
1st
CPLEX
2nd
CPLEX
3rd
CPLEX
4th
2-approx
2-approx +
Embedding
produces algorithm
with good tradeoff !
RNN
• Generate 200 Barabasi-
Albert networks with
300 nodes
• Let CPLEX produces 1st,
2nd, 3rd, 4th feasible
solutions
31. Learning graph opt: real-world data
http://snap.stanford.edu/netinf/#data
MemeTracker graph: 960 nodes and 5000 edges
Methods Approximation Ratio
Optimal 1.00 ( 473 nodes)
S2V-DQN 1.002 ( 474 nodes)
MVCApprox-Greedy 1.222 (578 nodes)
MVCApprox 1.408 (666 nodes)
• Learning from sampled
subgraphs
32. Learning graph opt: learned strategy
• Learned a greedy algorithm which is different from known ones
33. Learning graph opt: other problems
• Maximum cut
• Traveling Salesman Problem
optimal
Solution
found
• Set cover
Set cover image taken from Wikipedia
Learn with
Bipartite
graph
34. Thanks to my collaborators in this project
• Advisor:
Le Song
• Collaborators (alpha-beta order)
Bo Dai Bistra Dilkina Elias Khalil
Rakshit Trevidi Yichen Wang Yuyu Zhang
Now is the time to put them together. We start with the zero embeddings, and then perform one step of local filtering. For example, to get update of mu_2, we use its neighborhood embeddings and input features. Similarly, we can get updates for all node embeddings. We need to iterate through this process several times. Intuitively, this will allow each embedding capture more and more neighborhood information. In the last step, we merge those embeddings to get a vector representation of entire structure data. We can see this model can be trained in an end to end fashion.
If we unroll the interactions between users and items, we can get a dependency graph which looks like this. Here the dependency comes from the assumption that the user or item's latent representation will change over time, after each interaction event. We can then abstract it as a directed graph on the right hand side.
Different from the previous scenario, here we get a single gigantic graph. So we apply stochastic training with truncated backpropagation through time, which is commonly used in recurrent neural network.