Robust Multimodal 3D Object Detection via Modality-Agnostic Decoding and Proximity-based Modality Ensemble
Juhan Cha*, Minseok Joo*, Jihwan Park, Sanghyeok Lee, Injae Kim, Hyunwoo J. Kim†.
Please follow the instructions to set up the MEFormer
- Python 3.8
- CUDA 11.1
- PyTorch 1.10
git clone https://github.com/hanchaa/MEFormer.git
cd MEFormer
conda create -n MEFormer python=3.8
conda activate MEFormer
pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html
pip install openmim
mim install mmcv-full==1.6.0
pip install -r requirements.txt
Download the pretrained weight of the image backbone from Google Drive and move them to ckpts directory.
MEFormer
├─ ckpts
│ ├─ fcos3d_vovnet_imgbackbone-remapped.pth
│ └─ nuim_r50.pth
├─ figures
├─ projects
└─ tools
Follow the instructions from mmdet3d for data preparation.
tools/dist_train.sh $path_to_config$ 8
tools/dist_test.sh $path_to_config$ $path_to_weight$ 8 --eval bbox
Results on nuScenes validation set.
Config | NDS | mAP | Schedule | FPS | weights |
---|---|---|---|---|---|
MEFormer | 73.9% | 71.5% | 6 epoch * | 3.1 | Google Drive |
MEFormer w/o PME | 73.7% | 71.3% | 20 epoch | 3.4 | Google Drive |
FPS is measured with a single NVIDIA A6000 GPU.
* means MEFormer with PME should be trained after MEFormer w/o PME is trained first.