In recent years, there has been a surge of interest in open-vocabulary 3D scene reconstruction facilitated by visual language models (VLMs), which showcase remarkable capabilities in open-set retrieval. However, existing methods face some limitations: they either focus on learning point-wise features, resulting in blurry semantic understanding, or solely tackle object-level reconstruction, thereby overlooking the intricate details of the object's interior. To address these challenges, we introduce OpenObj, an innovative approach to build open-vocabulary object-level Neural Radiance Fields (NeRF) with fine-grained understanding. In essence, OpenObj establishes a robust framework for efficient and watertight scene modeling and comprehension at the object-level. Moreover, we incorporate part-level features into the neural fields, enabling a nuanced representation of object interiors. This approach captures object-level instances while maintaining a fine-grained understanding. The results on multiple datasets demonstrate that OpenObj achieves superior performance in zero-shot semantic segmentation and retrieval tasks. Additionally, OpenObj supports real-world robotics tasks at multiple scales, including global movement and local manipulation. connectivity to construct a hierarchical graph. Validation results from public dataset SemanticKITTI demonstrate that, OpenGraph achieves the highest segmentation and query accuracy.
Use conda to install the required environment. To avoid problems, it is recommended to follow the instructions below to set up the environment.
conda env create -f environment.yml
Follow the instructions to install the CropFormer model and download the pretrained weights CropFormer_hornet_3x.
Follow the instructions to install the TAP model and download the pretrained weights here.
pip install -U sentence-transformers
Download pretrained weights
git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
git clone https://github.com/BIT-DYN/OpenObj
cd OpenObj
OpenGraph has completed validation on Replica (as same with vMap) and Scannet. Please download the following datasets.
- Replica Demo - Replica Room 0 only for faster experimentation.
- Replica - All Pre-generated Replica sequences.
- ScanNet - Official ScanNet sequences.
Run the following command to identifie and comprehend object instances from color images.
cd maskclustering
python3 mask_gen.py --input /data/dyn/object/vmap/room_0/imap/00/rgb/*.png --input_depth /data/dyn/object/vmap/room_0/imap/00/depth/*.png --output results/room_0/mask/ --opts MODEL.WEIGHTS CropFormer_hornet_3x_03823a.pth
You can see a visualization of the results in the results/vis
folder.
Run the following command to ensure consistent object association across frames.
python3 mask_graph.py --config_file ./configs/room_0.yaml --input_mask results/room_0/mask/mask_init_all.pkl --input_depth /data/dyn/object/vmap/room_0/imap/00/depth/*.png --input_pose /data/dyn/object/vmap/room_0/imap/00/traj_w_c.txt --output_graph results/room_0/mask/graph/ --input_rgb /data/dyn/object/vmap/room_0/imap/00/rgb/*.png --output_dir /data/dyn/object/vmap/room_0/imap/00/ --input_semantic /data/dyn/object/vmap/room_0/imap/00/semantic_class/*.png
You can see a visualization of the results in the results/graph
folder.
And this will generate some folders (class_our/
instance_our/
) and documents (object_clipfeat.pkl
object_capfeat.pkl
object_caption.pkl
) in the data directory, which are necessary for the follow-up process.
Run the following command to distinguish parts and extracts their visual features.
cd ../partlevel
python sam_clip_dir.py --input_image /data/dyn/object/vmap/room_0/imap/00/rgb/*.png --output_dir /data/dyn/object/vmap/room_0/imap/00/partlevel --down_sample 5
This will generate a folder (partlevel/
) in the data directory, which is necessary for the follow-up process.
Run the following command to vectorize the training of NeRFs for all objects.
cd ../nerf
python train.py --config ./configs/Replica/room_0.json --logdir results/room_0
This will generate a folder (ckpt/
) in the result directory containing the network parameters for all objects.
Run the following command to generate the vis documents.
cd ../nerf
python gen_map_vis.py --scene_name room_0 --dataset_name Replica
Interactions can be made using our visualization files.
cd ../nerf
python vis_interaction.py --scene_name room_0 --dataset_name Replica --is_partcolor
Then in the open3d visualizer window, you can use the following key callbacks to change the visualization.
Press C
to toggle the ceiling.
Press S
to color the meshes by the object class.
Press R
to color the meshes by RGB.
Press I
to color the meshes by object instance ID.
Press O
to color the meshes by part-level feature.
Press F
and type object text and num in the terminal, and the meshes will be colored by the similarity.
Press P
and type object text and num and part text in the terminal, and the meshes will be colored by the similarity.
If you find our work helpful, please cite:
@article{openobj,
title={OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields with Fine-Grained Understanding},
author={Deng, Yinan and Wang, Jiahui and Zhao, Jingyu and Dou, Jianyu and Yang, Yi and Yue, Yufeng},
journal={arXiv preprint arXiv:2406.08009},
year={2024}
}
We would like to express our gratitude to the open-source projects and their contributors vMap. Their valuable work has greatly contributed to the development of our codebase.