The solution of the third prize of FPGA track in 2020 Low-Power Computer Vision Challenge(LPCVC). Designed by
Water Group, State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing; University of Chinese Academy of Sciences.
Shengwen Liang, Rick Lee
2020-08-20
The content in CVPR_2020_DAY12_4thead_9
folder is our award-winning solution that submits to LPCVC.
-
TF2.2 environment for train model
- TensorFlow version = 2.2.0
- Nvidia titan GPUs
- ImageNet Dataset (tfrecoder format)
-
Xilinx/Vitis-ai-cpu:docker
- Tensorflow version = 1.15.2
- keras version = 2.2.5
- Python version >= 3.6
- ImageNet Dataset
B2304, 200/400MHz, RAM_USAGE_LOW, RELU_LEAKYRELU_RELU6, POOL_AVG_ENABLE, DWCV_ENABLE, DSP USAGE LOW.
Considering the trade-off between latency and accuracy, we leverage efficientnet-lite0 as a baseline model. However, due to the architecture of original efficientnet-lite0 provided by google cannot meet the requirement of DPU with B2304 and high RAM configuration, we change the architecture of efficientnet-lite0 as shown in, to ensure that the model can be deployed on the Ultra96-V2 board with DPU (B2304, high RAM).
1. Run the following command to train modified efficientnet-lite0 model.cd tpu/efficientnet/
sh dpu_train.sh
We leverage tensorflow-2.2.0 to train efficientnet-lite0 model and get the best accuracy (72.499%) when train step reach 2893563, and the checkpoint file is provided in checkpoint file.
- Generate frozen.pb file using checkpoint file
cd tpu/efficientnet/
python frozen.py
cp tpu/efficientnet/frozen.pb deploy/tf_efficientnet_lite_imagenet_224_224/float/
cd deploy/tf_efficientnet_lite_imagenet_224_224/code/eval/
sh 3_tf_eval_frozen_graph.sh
- Quantize frozen.pb
sh 4_tf_quantize.sh
sh 5_tf_eval_quantize_graph.sh
- Compile quantized model to generate elf file for dpu, we can get dpu_efficientnet.elf file in
deploy_output
folder.
sh 6_tf_compile_for_v2.sh
- Copy file to
/CVPR_2020_DAY12_4thead_9/model/
file, and runtest_accuracy_latency.ipynb
to test accuracy and latency on Ultra96-V2 board. The thread is 4.
We provide a docker image in dokcer hub, which contains all source code except ImageNet dataset. You can start this dokcer image using docker_run.sh
The Ultra96-V2 Board needs to update the PMIC firmware. Without the PMIC firmware update, the DPU will cause peak current that exceeds the default 4A fault threshold, causing the power on reset to assert, and thus the board to reboot.
Apache License 2.0