This repository is the official implementation of gSDF: Geometry-Driven Signed Distance Functions for 3D Hand-Object Reconstruction. It also provides implementations for Grasping Field and AlignSDF. Project webpage: https://zerchen.github.io/projects/gsdf.html
Abstract: Signed distance functions (SDFs) is an attractive framework that has recently shown promising results for 3D shape reconstruction from images. SDFs seamlessly generalize to different shape resolutions and topologies but lack explicit modelling of the underlying 3D geometry. In this work, we exploit the hand structure and use it as guidance for SDF-based shape reconstruction. In particular, we address reconstruction of hands and manipulated objects from monocular RGB images. To this end, we estimate poses of hands and objects and use them to guide 3D reconstruction. More specifically, we predict kinematic chains of pose transformations and align SDFs with highly-articulated hand poses. We improve the visual features of 3D points with geometry alignment and further leverage temporal information to enhance the robustness to occlusion and motion blurs. We conduct extensive experiments on the challenging ObMan and DexYCB benchmarks and demonstrate significant improvements of the proposed method over the state of the art.
Please follow instructions listed below to build the environment.
conda create -n gsdf python=3.9
conda activate gsdf
conda install pytorch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirements.txt
- ObMan dataset preparations.
- Download ObMan data from the official website.
- Set up a soft link from the download path to
${ROOT}/datasets/obman/data
. - Download processed SDF files and json files.
- Run
${ROOT}/preprocess/cocoify_obman.py
to generate LMDB training files. The data organization looks like this:${ROOT}/datasets/obman └── splits obman_train.json obman_test.json obman.py data ├── val ├── train | ├── rgb | ├── rgb.lmdb | ├── sdf_hand | ├── sdf_hand.lmdb | ├── sdf_obj | ├── sdf_obj.lmdb └── test ├── rgb ├── mesh_hand ├── mesh_obj
- DexYCB dataset preparations.
- Download DexYCB data from the official webpage.
- Set up a soft link from the download path to
${ROOT}/datasets/dexycb/data
. - Download processed SDF files and json files.
- Run
${ROOT}/preprocess/cocoify_dexycb.py
to generate LMDB training files. The data organization looks like this:${ROOT}/datasets/dexycb └── splits toolkit dexycb_train_s0.json dexycb_test_s0.json dexycb.py data ├── 20200709-subject-01 ├── . ├── . ├── 20201022-subject-10 ├── bop ├── models ├── mesh_data ├── sdf_data ├── rgb_s0.lmdb ├── sdf_hand_s0.lmdb └── sdf_obj_s0.lmdb
-
Establish the output directory by
mkdir ${ROOT}/outputs
andcd ${ROOT}/tools
. -
${ROOT}/playground
provides implementations of different models:${ROOT}/playground ├── pose_kpt # A component for gSDF which solves pose estimation problem ├── hsdf_osdf_1net # The SDF network with a single backbone like Grasping Field or AlignSDF ├── hsdf_osdf_2net # The SDF network with two backbones like gSDF ├── hsdf_osdf_2net_pa # Compared with hsdf_osdf_2net, it additionally uses pixel-aligned visual features ├── hsdf_osdf_2net_video_pa # Compared with hsdf_osdf_2net_pa, it additionally uses spatial-temporay transformer to process multiple frames
-
Train the Grasping Field model:
CUDA_VISIBLE_DEVICES=4,5,6,7 python train.py --gpu 4-7 -e ../playground/hsdf_osdf_1net/experiments/obman_resnet18_hnerf3_onerf3.yaml
- Train the AlignSDF model:
CUDA_VISIBLE_DEVICES=4,5,6,7 python train.py --gpu 4-7 -e ../playground/hsdf_osdf_1net/experiments/obman_resnet18_hkine6_otrans6.yaml
- Train the gSDF model:
For Obman
# It first needs to train a checkpoint for hand pose estimation.
CUDA_VISIBLE_DEVICES=4,5,6,7 python train.py --gpu 4-7 -e ../playground/pose_kpt/experiments/obman_hand.yaml
# Then, load the pretrained pose checkpoint and train the SDF model.
CUDA_VISIBLE_DEVICES=4,5,6,7 python train.py --gpu 4-7 -e ../playground/hsdf_osdf_2net_pa/experiments/obman_presnet18_sresnet18_hkine6_okine6.yaml
For DexYCB
# It first needs to train a checkpoint for hand pose estimation.
CUDA_VISIBLE_DEVICES=4,5,6,7 python train.py --gpu 4-7 -e ../playground/pose_kpt/experiments/dexycb_s0_hand.yaml
# Then, load the pretrained pose checkpoint and train the SDF model.
CUDA_VISIBLE_DEVICES=4,5,6,7 python train.py --gpu 4-7 -e ../playground/hsdf_osdf_2net_pa/experiments/dexycbs0_presnet18_sresnet18_hkine6_okine6.yaml hand_point_latent 51 obj_point_latent 72 ckpt ../outputs/pose_kpt/dexycbs0_29k_resnet18_rot0_6d_h1_o0_norm0_e100_b128_vw1.0_ocrw0.0_how1.0_sow0.0/model_dump/snapshot_99.pth.tar
# Train the model that processes multiple frames (DexYCB provides videos).
CUDA_VISIBLE_DEVICES=4,5,6,7 python train.py --gpu 4-7 -e ../playground/hsdf_osdf_2net_video_pa/experiments/dexycbs0_3frames_presnet18_sresnet18_hkine6_okine6.yaml hand_point_latent 51 obj_point_latent 72 ckpt path_to_pretrained_model
Actually, when it finishes training, the script will launch the testing automatically. You could also launch the training explicitly by:
For Obman
CUDA_VISIBLE_DEVICES=1 python test.py --gpu 1 -e ../outputs/hsdf_osdf_2net_pa/gsdf_obman/exp.yaml
For DexYCB
CUDA_VISIBLE_DEVICES=7 python test.py --gpu 7 -e ../outputs/hsdf_osdf_2net_pa/dexycbs0_29k_resnet18_resnet18_h1_o1_sdf5_cls0_rot0_hand_kine_51_obj_kine_72_np2000_adf1_e1600_ae1201_scale6.2_b64_hsw0.5_osw0.5_hcw0.0_vw0.5/exp.yaml
After the testing phase ends, you could evaluate the performance: For Obman
CUDA_VISIBLE_DEVICES=1 python eval.py --gpu 1 -e ../outputs/hsdf_osdf_2net_pa/gsdf_obman
For DexYCB
CUDA_VISIBLE_DEVICES=7 python eval.py -e ../outputs/hsdf_osdf_2net_pa/dexycbs0_29k_resnet18_resnet18_h1_o1_sdf5_cls0_rot0_hand_kine_51_obj_kine_72_np2000_adf1_e1600_ae1201_scale6.2_b64_hsw0.5_osw0.5_hcw0.0_vw0.5
If you find this work useful, please consider citing:
@InProceedings{chen2023gsdf,
author = {Chen, Zerui and Chen, Shizhe and Schmid, Cordelia and Laptev, Ivan},
title = {{gSDF}: {Geometry-Driven} Signed Distance Functions for {3D} Hand-Object Reconstruction},
booktitle = {CVPR},
year = {2023},
}
Some of the codes are built upon manopth, PoseNet, PCL, Grasping Field, and HALO. Thanks them for their great works!