Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

PRBonn/3DiSS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Towards Generating Realistic 3D Semantic Training Data for Autonomous Driving

Paper

This repo contains the code for the 3D semantic scene generation method proposed in the paper: "Towards Generating Realistic 3D Semantic Training Data for Autonomous Driving"

In this paper we propose a 3D semantic scene generation method without requiring image projections and without training multiple decoupled VAE and DDPM models. By training the VAE and the DDPM with a single model we achieved more realistic scene generation compared to previous methods. In the article we also showed that training a semantic segmentation network with real data and with scenes generated by our method we were able to improve the model performance in the semantic segmentation task.

Dependencies

Installing python (we have used python 3.9) packages pre-requisites:

sudo apt install build-essential python3-dev libopenblas-dev

pip install -r requirements.txt

Installing MinkowskiEngine:

pip install -U MinkowskiEngine==0.5.4 --install-option="--blas=openblas" -v --no-deps

To setup the code run the following command on the code main directory:

pip install -U -e .

Conda Installation

You can also install the dependencies with conda environment: conda create --name 3diss python=3.9 && conda activate 3diss

Then again, installing python packages pre-requisites:

sudo apt install build-essential python3-dev libopenblas-dev

pip install -r requirements.txt

And installing MinkowskiEngine:

pip install -U git+https://github.com/NVIDIA/MinkowskiEngine -v --no-deps

NOTE: At the moment, MinkowskiEngine is not compatible with python 3.10+, see this issue

SemanticKITTI Dataset

The SemanticKITTI dataset has to be download from the official site and extracted in the following structure:

./diss/
└── data/
    └── SemanticKITTI
        └── dataset
          └── sequences
            ├── 00/
            │   ├── velodyne/
            |   |       ├── 000000.bin
            |   |       ├── 000001.bin
            |   |       └── ...
            │   └── labels/
            |       ├── 000000.label
            |       ├── 000001.label
            |       └── ...
            ├── 08/ # for validation
            ├── 11/ # 11-21 for testing
            └── 21/
                └── ...

For the poses we use pin-slam to compute it. You can download the poses from here and extract it to ./diss/data/SemanticKITTI/datasets/sequences/pin_slam_poses.

Ground truth generation

To generate the ground complete scenes you can run the sem_map_from_scans.py script. This will use the dataset scans and poses to generate the sequence map to be used as ground truth during training:

python tools/sem_map_from_scans.py

Once the sequences map is generated you can then train the VAE and diffusion models.

VAE Training

To train the VAE you can run the following command:

python vae_train.py

By default we set the config as used in the paper, training with batch size 2 and with 6 NVIDIA A40 GPUs. In case you want to change the VAE training config you can edit the config/vae.yaml file.

After the VAE is trained you can run the VAE refinement training with:

python vae_train.py --weights VAE_CKPT --config config/vae_refine.yaml

Which will do the refinement training only on the VAE decoder weights.

Diffusion Training

After the VAE is trained you can run the folowing command to train the unconditional DDPM:

python diff_train.py --vae_weights VAE_CKPT

By default, the diffusion training is set to be trained as an unconditional DDPM and with the configuration used in the paper, with 8 NVIDIA A40 GPUs. In case you want to change the configuration you can change the file config/diff.yaml.

For the LiDAR scan conditioning training you can run:

python diff_train.py --vae_weights VAE_CKPT --config config/diff_cond_config.yaml --condition single_scan

Which will train the model conditioned to the dataset LiDAR point clouds.

Model Weights

You can download the trained model weights from the following links:

Diffusion Inference

For running the unconditional scene generation we provide a pipeline where both the diffusion and VAE trained models are loaded and used to generate a novel scene. You can run the pipeline with the command:

python tools/diff_pipeline.py --diff DIFF_CKPT --vae VAE_REFINE_CKPT

To run the pipeline for the conditional scene generation you can run:

python tools/diff_pipeline.py --path PATH_TO_SCANS --diff DIFF_CKPT --vae VAE_REFINE_CKPT --condition single_scan

The generated point cloud will be saved in results/{EXPERIMENT}/diff_x0.

To visualize the generated point clouds we provide a visualization tool which can be used as:

python tools/pcd_vis.py --path results/{EXPERIMENT}/diff_x0

Citation

If you use this repo, please cite as :

@article{nunes2025arxiv,
    author = {Lucas Nunes and Rodrigo Marcuzzi and Jens Behley and Cyrill Stachniss},
    title = {{Towards Generating Realistic 3D Semantic Training Data for Autonomous Driving}},
    journal = arxiv,
    year = {2025},
    volume = {arXiv:2503.21449},
    url = {https://arxiv.org/pdf/2503.21449},
}

About

3D Diffusion Semantic Scenes

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages