MaGNAS: A Mapping-Aware Graph Neural Architecture Search Framework for Heterogeneous MPSoC Deployment
Abstract
1 Introduction
1.1 Motivational Example
1.2 Novel Contributions
2 A Primer On Vision Graph Neural Network (ViG)
3 System Model and Problem Formulation
3.1 System Model for Mapping GNNs onto Heterogeneous SoCs
3.1.1 GNN Workload Characterization.
3.1.2 Performance Modelling.
3.1.3 Mapping Problem Formulation.
3.2 Nested Search Formulation
4 MaGNAS Framework
4.1 Supernet Construction and Training
4.1.1 ViG Superblocks.
4.1.2 𝔸 Search Parameters.
4.1.3 Supernet Training.
4.2 Nested Evolutionary Search: Outer Optimization Engine (OOE)
4.2.1 Subspace 𝔸 Description.
4.2.2 OOE Evolutionary Search.
4.3 Nested Evolutionary Search: Inner Optimization Engine (IOE)
4.3.1 Subspace 𝕄 Description.
4.3.2 IOE Evolutionary Search.
4.3.3 Constrained Search.
4.3.4 Performance Characterization.
4.3.5 DVFS Search Support.
5 Experiments
5.1 Experimental Setup
5.1.1 Supernet Design.
5.1.2 Datasets and Training.
5.1.3 Evolutionary Search Settings.
Decision variables | Values | Cardinality |
---|---|---|
Supernet Search Space ( \(\mathbb {A})\) | ||
Superblock depth (d) | {2, 3, 4} | 3 |
Graph Op | {Max-Relative, EdgeConv, GraphSAGE, GIN} | 4 |
Skip pre-process (fc_use) | {False, True} | 2 |
Skip post-process (ffn_use) | {False, True} | 2 |
FFN hidden features (w) | {96, 192, 320} | 3 |
Mapping Search Space ( \(\mathbb {M}\) ) for NVIDIA Xavier AGX | ||
Computing units | {GPU, DLA} | 2 |
Mapping granularity | {Stem, Grapher, FFN, Cls} | \(\mathcal {O}\) (1.7 \(\times 10^{12}\) ) |
DVFS Settings Search space ( \(\Psi)\) for NVIDIA Xavier AGX | ||
CPU clock frequency | {1728MHz, 2265MHz} | 2 |
GPU clock frequency | {520MHz, 900MHz, 1377MHz} | 3 |
EMC clock frequency | {1065MHz, 2133MHz} | 2 |
DLA clock frequency | {1050MHz, 1395MHz} | 2 |
5.1.4 Hardware Experimental Settings.
5.1.5 Baselines.
5.2 OOE Results: GNN Architecture Optimization
5.3 IOE Results: Hardware Mapping Optimization
5.4 Analysis of Pareto Search and Models
5.4.1 Results Discussion.
Datasets | GNN Models | TOP-1 Acc (%) | Graph-Ops (M, E, G, S) | FFN-use (%) | FC pre-use (%) | Latency (ms) | Energy (mJ) | GPU-use (%) | DLA-use (%) |
---|---|---|---|---|---|---|---|---|---|
All-datasets | \(\Phi\) Baseline-b0 | C10: 94.15, C100: 82.13 F: 89.71, Ti: 68.12 | M-M-M-M | 100 | 100 | G: 25.28 D: 40.11 | G: 459.44 D: 224.41 | - | - |
\(\bigstar\) Baseline-b1 | C10: 94.15 C100: 82.13 F: 90.29, Ti: 68.15 | E-E-E-E | 100 | 100 | G: 33.74 D: 62.11 | G: 770.36 D: 323.70 | - | - | |
\(\bowtie\) Baseline-b2 | C10: 94.20, C100: 81.49 F: 86.37, Ti: 67.62 | G-G-G-G | 100 | 100 | G: 22.49 D: 39.62 | G: 429.07 D: 214.35 | - | - | |
\(\Omega\) Baseline-b3 | C10: 94.27, C100: 82.10 F: 88.92, Ti: 68.32 | S-S-S-S | 100 | 100 | G: 29.57 D: 57.77 | G: 623.76 D: 263.48 | - | - | |
CIFAR-10 (C10) | \(\bigcirc\) Ours-a0 | 94.25 | G-G-G-G | 25 | 25 | 16.02 | 97.0 | 09 | 91 |
\(\bigcirc\) Ours-a1 | 94.46 | G-G-G-G | 100 | 0 | 19.49 | 118.00 | 17 | 83 | |
\(\bigcirc\) Ours-a2 | 94.32 | G-M-G-G | 25 | 0 | 11.19 | 121.14 | 75 | 25 | |
\(\bigcirc\) Ours-a3 | 94.32 | G-M-G-G | 25 | 0 | 14.18 | 105.11 | 33 | 67 | |
CIFAR-100 (C100) | \(\bigcirc\) Ours-a0 | 82.13 | S-G-S-G | 100 | 25 | 17.72 | 180.56 | 50 | 50 |
\(\bigcirc\) Ours-a1 | 82.17 | S-S-S-S | 100 | 75 | 34.72 | 271.62 | 30 | 70 | |
\(\bigcirc\) Ours-a2 | 81.63 | G-G-G-G | 50 | 50 | 15.06 | 131.81 | 50 | 50 | |
\(\bigcirc\) Ours-a3 | 82.13 | S-G-S-G | 100 | 25 | 17.29 | 197.80 | 55 | 45 | |
Oxford-Flowers (F) | \(\bigcirc\) Ours-a0 | 89.90 | M-G-M-M | 75 | 75 | 14.37 | 153.54 | 69 | 31 |
\(\bigcirc\) Ours-a1 | 88.43 | G-G-G-G | 0 | 50 | 9.60 | 119.07 | 90 | 10 | |
\(\bigcirc\) Ours-a2 | 88.43 | G-G-G-G | 0 | 50 | 12.30 | 105.88 | 40 | 60 | |
\(\bigcirc\) Ours-a3 | 89.02 | M-G-G-G | 25 | 25 | 12.82 | 116.63 | 50 | 50 | |
Tiny-ImageNet (Ti) | \(\bigcirc\) Ours-a0 | 68.40 | M-G-G-G | 25 | 0 | 13.07 | 114.89 | 50 | 50 |
\(\bigcirc\) Ours-a1 | 68.40 | M-G-G-G | 25 | 0 | 15.47 | 102.06 | 17 | 83 | |
\(\bigcirc\) Ours-a2 | 68.51 | M-G-G-G | 75 | 25 | 16.37 | 122.56 | 38 | 62 | |
\(\bigcirc\) Ours-a3 | 68.51 | M-G-G-G | 75 | 25 | 17.87 | 115.78 | 19 | 81 |
5.4.2 Hypervolume and Pareto Composition Analysis.
5.4.3 Analysis of GNN Workload Distribution.
Mapping option | Stem | Grapher | FFN | Cls | #transit | Lat. | Enrg. |
---|---|---|---|---|---|---|---|
DLA-only | D | D-D-D-D-D-D-D-D | D-D-D-D-D-D-D-D | D | 0 | 25.56 | 121.74 |
GPU-only | G | G-G-G-G-G-G-G-G | G-G-G-G-G-G-G-G | G | 0 | 13.42 | 273.22 |
constr-transit1 | D | D-G-G-G-G-G-G-G | D-G-G-G-G-G-G-G | G | 1 | 16.31 | 232.60 |
constr-transit1 | G | G-G-G-G-G-D-D-D | G-G-G-G-G-D-D-D | D | 1 | 17.42 | 226.79 |
constr-transit2 | D | D-G-G-G-G-G-G-D | D-G-G-G-G-G-G-D | D | 2 | 17.58 | 220.23 |
constr-transit2 | G | G-G-D-D-D-G-G-G | G-G-D-D-D-G-G-G | G | 2 | 17.11 | 227.15 |
Ours (IOE) | D | G-G-G-G-G-G-G-G | G-D-D-D-D-G-D-D | D | 12 | 17.29 | 197.8 |
5.5 Constraint-aware Optimization
Workload Distribution | Allowable latency increase ratio (%) | ||||||
5 | 10 | 20 | 40 | 60 | 80 | 100 | |
Avg. GPU utilization | 0.97 | 0.91 | 0.74 | 0.56 | 0.50 | 0.50 | 0.50 |
Avg. DLA utilization | 0.03 | 0.09 | 0.26 | 0.44 | 0.50 | 0.50 | 0.50 |
Workload Distribution | Available Power Budget (mW) | ||||
10 | 15 | 20 | 25 | 30 | |
Avg. GPU utilization | 0.74 | 0.76 | 0.88 | 0.88 | 0.81 |
Avg. DLA utilization | 0.26 | 0.24 | 0.13 | 0.13 | 0.19 |
5.6 Ablation Study on the Impact of DVFS
5.7 Generality and Scalability
5.7.1 On the ViG Architectural Level.
5.7.2 On the Hardware CU Level.
5.7.3 On the Power of Evolution.
6 Discussion and Future Directions
7 Related Works
[14] | [13] | [47] | [28] | [45] | [46] | MaGNAS | |
---|---|---|---|---|---|---|---|
Training-in-the-loop NAS | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | |||
Once-for-all NAS | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | ||||
Vision GNN | \(\checkmark\) | ||||||
Hardware Awareness | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | ||||
GNN-Hardware co-design | \(\checkmark\) | ||||||
Edge Computing Setting | \(\checkmark\) | \(\checkmark\) | |||||
Distributed Mapping | \(\checkmark\) |
8 Conclusion
Acknowledgement
References
Index Terms
- MaGNAS: A Mapping-Aware Graph Neural Architecture Search Framework for Heterogeneous MPSoC Deployment
Recommendations
Toward the analysis of graph neural networks
ICSE-NIER '22: Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging ResultsGraph Neural Networks (GNNs) have recently emerged as an effective framework for representing and analyzing graph-structured data. GNNs have been applied to many real-world problems such as knowledge graph analysis, social networks recommendation, and ...
Heterogeneous Line Graph Neural Network for Link Prediction
Advanced Data Mining and ApplicationsAbstractHeterogeneous network link prediction is an important network information mining problem. Existing link prediction methods for heterogeneous networks typically require predefined meta-paths with prior knowledge. To address the problem, we propose ...
Heterogeneous Temporal Graph Neural Network Explainer
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge ManagementGraph Neural Networks (GNNs) have been a prominent research area and have been widely deployed in various high-stakes applications in recent years, leading to a growing demand for explanations. While existing explainer methods focus on explaining ...
Comments
Information & Contributors
Information
Published In
Publisher
Association for Computing Machinery
New York, NY, United States
Journal Family
Publication History
Check for updates
Author Tags
Qualifiers
- Research-article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 1,651Total Downloads
- Downloads (Last 12 months)1,651
- Downloads (Last 6 weeks)123
Other Metrics
Citations
Cited By
View allView Options
Get Access
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in