This repo contains the code for the 3D semantic scene generation method proposed in the paper: "Towards Generating Realistic 3D Semantic Training Data for Autonomous Driving"
In this paper we propose a 3D semantic scene generation method without requiring image projections and without training multiple decoupled VAE and DDPM models. By training the VAE and the DDPM with a single model we achieved more realistic scene generation compared to previous methods. In the article we also showed that training a semantic segmentation network with real data and with scenes generated by our method we were able to improve the model performance in the semantic segmentation task.
Installing python (we have used python 3.9) packages pre-requisites:
sudo apt install build-essential python3-dev libopenblas-dev
pip install -r requirements.txt
Installing MinkowskiEngine:
pip install -U MinkowskiEngine==0.5.4 --install-option="--blas=openblas" -v --no-deps
To setup the code run the following command on the code main directory:
pip install -U -e .
You can also install the dependencies with conda environment:
conda create --name 3diss python=3.9 && conda activate 3diss
Then again, installing python packages pre-requisites:
sudo apt install build-essential python3-dev libopenblas-dev
pip install -r requirements.txt
And installing MinkowskiEngine:
pip install -U git+https://github.com/NVIDIA/MinkowskiEngine -v --no-deps
NOTE: At the moment, MinkowskiEngine is not compatible with python 3.10+, see this issue
The SemanticKITTI dataset has to be download from the official site and extracted in the following structure:
./diss/
└── data/
└── SemanticKITTI
└── dataset
└── sequences
├── 00/
│ ├── velodyne/
| | ├── 000000.bin
| | ├── 000001.bin
| | └── ...
│ └── labels/
| ├── 000000.label
| ├── 000001.label
| └── ...
├── 08/ # for validation
├── 11/ # 11-21 for testing
└── 21/
└── ...
For the poses we use pin-slam to compute it. You can download the poses from here and extract it to ./diss/data/SemanticKITTI/datasets/sequences/pin_slam_poses
.
To generate the ground complete scenes you can run the sem_map_from_scans.py
script. This will use the dataset scans and poses to generate the sequence map to be used as ground truth during training:
python tools/sem_map_from_scans.py
Once the sequences map is generated you can then train the VAE and diffusion models.
To train the VAE you can run the following command:
python vae_train.py
By default we set the config as used in the paper, training with batch size 2 and with 6 NVIDIA A40 GPUs. In case you want to change the VAE training config you can edit the config/vae.yaml
file.
After the VAE is trained you can run the VAE refinement training with:
python vae_train.py --weights VAE_CKPT --config config/vae_refine.yaml
Which will do the refinement training only on the VAE decoder weights.
After the VAE is trained you can run the folowing command to train the unconditional DDPM:
python diff_train.py --vae_weights VAE_CKPT
By default, the diffusion training is set to be trained as an unconditional DDPM and with the configuration used in the paper, with 8 NVIDIA A40 GPUs. In case you want to change the configuration you can change the file config/diff.yaml
.
For the LiDAR scan conditioning training you can run:
python diff_train.py --vae_weights VAE_CKPT --config config/diff_cond_config.yaml --condition single_scan
Which will train the model conditioned to the dataset LiDAR point clouds.
You can download the trained model weights from the following links:
For running the unconditional scene generation we provide a pipeline where both the diffusion and VAE trained models are loaded and used to generate a novel scene. You can run the pipeline with the command:
python tools/diff_pipeline.py --diff DIFF_CKPT --vae VAE_REFINE_CKPT
To run the pipeline for the conditional scene generation you can run:
python tools/diff_pipeline.py --path PATH_TO_SCANS --diff DIFF_CKPT --vae VAE_REFINE_CKPT --condition single_scan
The generated point cloud will be saved in results/{EXPERIMENT}/diff_x0
.
To visualize the generated point clouds we provide a visualization tool which can be used as:
python tools/pcd_vis.py --path results/{EXPERIMENT}/diff_x0
If you use this repo, please cite as :
@article{nunes2025arxiv,
author = {Lucas Nunes and Rodrigo Marcuzzi and Jens Behley and Cyrill Stachniss},
title = {{Towards Generating Realistic 3D Semantic Training Data for Autonomous Driving}},
journal = arxiv,
year = {2025},
volume = {arXiv:2503.21449},
url = {https://arxiv.org/pdf/2503.21449},
}