Cited By
View all- Zhang YLiang WXu WXu ZJia X(2024)Cost Minimization of Digital Twin Placements in Mobile Edge ComputingACM Transactions on Sensor Networks10.1145/365844920:3(1-26)Online publication date: 6-May-2024
Num. of layers (d) | 6 |
Num. of heads (h) | 2 |
Embedding dim. per head (e) | 64 |
Linear projection ratio (r) | 4 |
Image resolution (I) | 160 |
Patch size (p) | 16 |
FLOPs (G) | 0.15 |
Throughput on V100 (FPS) | 20086 |
Latency on Pixel3 (ms) | 30.05 |
Latency on TX2 (ms) | 4.42 |
Model | FLOPs (G) | Top-1 accuracy (%) | d | h | e | r | I | p |
---|---|---|---|---|---|---|---|---|
DeiT-Tiny | 1.26 | 74.5 | 12 | 3 | 64 | 4 | 224 | 16 |
DeiT-Scaled-Tiny | 1.22 | 76.4 ( \(\uparrow\) 1.9) | 14 | 4 | 64 | 4 | 160 | 16 |
DeiT-Small | 4.62 | 81.2 | 12 | 6 | 64 | 4 | 224 | 16 |
DeiT-Scaled-Small | 4.79 | 81.6 ( \(\uparrow\) 0.4) | 20 | 4 | 64 | 4 | 256 | 16 |
DeiT-Base | 17.66 | 83.4 | 12 | 12 | 64 | 4 | 224 | 16 |
DeiT-Scaled-Base | 16.82 | 83.8 ( \(\uparrow\) 0.4) | 20 | 6 | 64 | 4 | 320 | 16 |
Model | FLOPs (G) | Top-1 accuracy (%) |
---|---|---|
DeiT-Tiny | 1.26 | 74.5 |
DeiT-Scaled-Tiny | 1.22 | 76.4 ( \(\uparrow\) 1.9) |
DeiT-Tiny/1000 epochs | 1.26 | 76.6 |
DeiT-Scaled-Tiny/1000 epochs | 1.22 | 78.3 ( \(\uparrow\) 1.7) |
DeiT-Small | 4.62 | 81.2 |
DeiT-Scaled-Small | 4.79 | 81.6 ( \(\uparrow\) 0.4) |
DeiT-Small/1000 epochs | 4.62 | 82.6 |
DeiT-Scaled-Small/1000 epochs | 4.79 | 82.9 ( \(\uparrow\) 0.3) |
Device | Deployment tool | Hardware-cost measurement tool | Target application |
---|---|---|---|
NVIDIA V100 | PyTorch | PyTorch profiler | Cloud services w/strong GPUs |
NVIDIA Edge GPU TX2 | TensorRT | TensorRT command-line wrapper | Edge computing w/weak GPUs |
Google Pixel3 | Tflite | Tflite benchmark tools | Mobile deployment w/o GPUs |
Model | Top-1 accuracy (%) | Latency on TX2 (ms) | Latency on Pixel3 (ms) | d | h | e | r | I | p |
---|---|---|---|---|---|---|---|---|---|
Pixel3 Scaling () | 74.8 | 20.91 | 181.07 | 16 | 2 | 108 | 4 | 160 | 16 |
TX2 Scaling () | 74.0 ( \(\downarrow\) 0.8) | 14.44 ( \(\downarrow\) 30.94%) | 275.06 ( \(\uparrow\) 51.90%) | 6 | 4 | 64 | 16 | 160 | 16 |
TX2 Scaling () | 78.2 | 23.70 | 456.41 | 10 | 4 | 64 | 16 | 160 | 16 |
Pixel3 Scaling () | 77.5 ( \(\downarrow\) 0.7) | 27.43 ( \(\uparrow\) 15.74%) | 297.58 ( \(\downarrow\) 34.80%) | 16 | 2 | 142 | 4 | 160 | 16 |
Model | Top-1 accuracy (%) | FLOPs (G) |
---|---|---|
PiT-Tiny | 74.6 | 0.71 |
PiT-Scaled-Tiny | 76.7 ( \(\uparrow\) 2.1) | 0.70 |
PiT-XS | 79.1 | 1.40 |
PiT-Scaled-XS | 79.5 ( \(\uparrow\) 0.4) | 1.38 |
PiT-Small | 81.9 | 2.9 |
PiT-Small (Reproduced) | 81.7 | 2.9 |
PiT-Scaled-Small | 81.8 ( \(\uparrow\) 0.1) | 3.0 |
Backbone | Average precision (%) | Throughput (FPS) on V100 |
---|---|---|
DeiT-Tiny | 35.0 | 13.31 |
DeiT-Scaled-Tiny | 35.7 ( \(\uparrow\) 0.7) | 13.05 |
DeiT-Small | 41.0 | 10.81 |
DeiT-Scaled-Small | 41.7 ( \(\uparrow\) 0.7) | 9.81 |
Attention Scheme | Model | Top-1 Accuracy (%) | FLOPs (G) |
---|---|---|---|
Joint | DeiT-Tiny | 67.7 | 19.9 |
DeiT-Scaled-Tiny | 67.4 ( \(\downarrow\) 0.3) | 13.3 ( \(\downarrow\) 33.2%) | |
DeiT-Small | 71.2 | 56.5 | |
DeiT-Scaled-Small | 71.4 ( \(\uparrow\) 0.2) | 61.9 ( \(\uparrow\) 9.56%) | |
Divided | DeiT-Tiny | 68.4 | 13.6 |
DeiT-Scaled-Tiny | 67.8 ( \(\downarrow\) 0.6) | 12.7 ( \(\downarrow\) 6.62%) | |
DeiT-Small | 71.4 | 50.8 | |
DeiT-Scaled-Small | 72.0 ( \(\uparrow\) 0.6) | 54.2 ( \(\uparrow\) 6.69%) |
Model | FLOPs (G) | Top-1 accuracy (%) | I | Spatial size | d | h | e |
---|---|---|---|---|---|---|---|
DeiT-Tiny | 1.26 | 74.5 | 224 | 14 \(\times\) 14 | 12 | 3 | 64 |
DeiT-Scaled-Tiny | 1.22 | 76.4 ( \(\uparrow\) 1.9) | 160 | 10 \(\times\) 10 | 14 | 4 | 64 |
PiT-Tiny | 0.71 | 74.6 | 224 | 27 \(\times\) 27 | 2 | 2 | 32 |
14 \(\times\) 14 | 6 | 4 | 32 | ||||
7 \(\times\) 7 | 4 | 8 | 32 | ||||
PiT-Scaled-Tiny | 0.70 | 76.7 ( \(\uparrow\) 2.1) | 160 | 19 \(\times\) 19 | 2 | 3 | 32 |
10 \(\times\) 10 | 7 | 6 | 32 | ||||
5 \(\times\) 5 | 4 | 12 | 32 | ||||
PiT-XS | 1.41 | 79.1 | 224 | 27 \(\times\) 27 | 2 | 2 | 48 |
14 \(\times\) 14 | 6 | 4 | 48 | ||||
7 \(\times\) 7 | 4 | 8 | 48 | ||||
PiT-Scaled-XS | 1.38 | 79.5 ( \(\uparrow\) 0.4) | 160 | 19 \(\times\) 19 | 2 | 3 | 48 |
10 \(\times\) 10 | 6 | 6 | 48 | ||||
5 \(\times\) 5 | 4 | 12 | 48 | ||||
DeiT-Small | 4.62 | 81.2 | 224 | 14 \(\times\) 14 | 12 | 6 | 64 |
DeiT-Scaled-Small | 4.79 | 81.6 ( \(\uparrow\) 0.4) | 256 | 16 \(\times\) 16 | 20 | 4 | 64 |
PiT-Small | 2.90 | 81.7 | 224 | 27 \(\times\) 27 | 2 | 3 | 48 |
14 \(\times\) 14 | 6 | 6 | 48 | ||||
7 \(\times\) 7 | 4 | 12 | 48 | ||||
PiT-Scaled-Small | 3.04 | 81.8 ( \(\uparrow\) 0.1) | 256 | 31 \(\times\) 31 | 3 | 2 | 48 |
16 \(\times\) 16 | 10 | 4 | 48 | ||||
8 \(\times\) 8 | 6 | 8 | 48 |
Specifications | NVIDIA V100 System (V100) | NVIDIA Edge GPU TX2 (TX2) | Google Pixel3 (Pixel3) |
---|---|---|---|
GPU Architecture | NVIDIA Volta | NVIDIA Pascal | Qualcomm Adreno |
CUDA Cores | 5120 | 256 | - |
CPU | AMD EPYC 7742 | NVIDIA Denver 2/ARM® Cortex®-A57 | Kryo 385 Gold/Kryo 385 Silver |
CPU Max Frequency | 3.4 GHz | 2 GHz/2 GHz | 2.8 GHz/1.7 GHz |
GPU/SoC Memory | 16 GB | 8 GB | 4 GB |
Power Consumption | 300 W | 15 W | 18 W |
Operators | 1/FPS on V100 (%) | Latency on TX2 (%) | Latency on Pixel3 (%) |
---|---|---|---|
MLP | 62.50 | 34.31 | 69.40 |
LayerNorm | 8.95 | 6.38 | 1.59 |
MSA-MatMul | 17.65 | 8.03 | 21.36 |
MSA-Softmax | 3.48 | 5.75 | 3.20 |
MSA-Reshape& Transpose | 5.17 | 6.32 | 3.30 |
MSA-Gather | \(\lt\) 0.01 | 36.38 | \(\lt\) 0.01 |
Others | 2.20 | 2.82 | 1.26 |
Metrics | V100 Scaling | TX2 Scaling |
---|---|---|
Accuracy (%) | 78.10 | 78.17 |
FPS on V100 | 2488.81 | 1984.10 |
Latency on TX2 (ms) | 25.18 | 23.70 |
Num. of layers (d) | 13 | 10 |
Num. of heads (h) | 5 | 4 |
Embedding dim. per head (e) | 64 | 64 |
Linear projection ratio (r) | 4 | 16 |
Image resolution (I) | 160 | 160 |
Patch size (p) | 16 | 16 |
Model | FLOPs (G) | Top-1 accuracy (%) | d | h | e | r | I | p |
---|---|---|---|---|---|---|---|---|
DeiT-Tiny | 1.26 | 74.5 | 12 | 3 | 64 | 4 | 224 | 16 |
DeiT-Scaled-Tiny | 1.22 | 76.4 ( \(\uparrow\) 1.9) | 14 | 4 | 64 | 4 | 160 | 16 |
DeiT-Scaled-Tiny-RP | 1.22 | 76.9 ( \(\uparrow\) 2.4) | 17 | 4 | 60 | 5 | 171 | 19 |
DeiT-Small | 4.62 | 81.2 | 12 | 6 | 64 | 4 | 224 | 16 |
DeiT-Scaled-Small | 4.79 | 81.6 ( \(\uparrow\) 0.4) | 20 | 4 | 64 | 4 | 256 | 16 |
DeiT-Scaled-Small-RP | 4.79 | 82.0 ( \(\uparrow\) 0.8) | 21 | 4 | 68 | 5 | 210 | 15 |
This project aims to study the different performance between the Vision Transformer and a Convolu- tional Nerual Network. Google Colab will be used as the environment in this project. The dataset will use CIFAR-100 image dataset to train vision ...
Benefiting from the advantages of good parallelism and features that support long-distance dependency modeling, a variety of ViT models based on the self-attention mechanism show outstanding performance in image classification tasks. ...
Display Omitted
Vision Transformer (ViT), as a powerful alternative to Convolutional Neural Network (CNN), has received much attention. Recent work showed that ViTs are also vulnerable to adversarial examples like CNNs. To build robust ViTs, an intuitive way is ...
Association for Computing Machinery
New York, NY, United States
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in