The Audio-Visual Conversational Graph: from an Egocentric-Exocentric Perspective (CVPR 2024)

This is a third-party implementation of CVPR 2024 paper The Audio-Visual Conversational Graph: from an Egocentric-Exocentric Perspective.

[Paper] [Supplement] [Project Page and Demo]

⚙️ Installation

To set up the avconv conda environment with all required packages, run:

conda env create -f avconv.yaml
conda activate avconv

📁 Data Preparation

Dataset Download & Availability
Please note that this repository does not provide any dataset download links.
The original dataset used in the paper has not yet been publicly released by Meta.
This page will be updated with the official source once it becomes available. Alternatively, you may collect your own multi-modal conversational data following the dataset description in the paper.

Directory Structure
Once you obtain the dataset, organize it as follows:

Audio-Visual Data

../data/av_data/{session_number}/image_{frame_number}.jpg
../data/av_data/{session_number}/a1_{frame_number}.mat

Ground-Truth Label Files
```
../data/av_label/{session_number}.json
```

Update Paths in Parameter Files
In both training and evaluation configs (params_train.json, params_test.json), make sure the following fields are set correctly:
```
data_path: "../data/av_data"
label_path: "../data/av_label"
```

🏋️ Training

Parameter file: ./params/params_train.json
Required paths:
- data_path, label_path, log_path
- checkpoints and tensorboard logs are saved under log_path

To start training:

python train_net.py

🧪 Evaluation

Parameter file: ./params/params_test.json
Required paths:
- data_path, label_path, checkpoint_path, out_path
- Set checkpoint_path to the desired checkpoint to evaluate
- Set out_path to specify where to save the output predictions in preds.pkl files

To run evaluation:

python test_net.py

Predictions will be saved at: ./output/{ckpt_log}_inference/preds.pkl

⚠️ Disclaimer

This implementation does not contain any proprietary code, internal tools, or unpublished resources from Meta. All components—including architecture, data loaders, and configurations—were reproduced independently for academic and community research purposes.

📚 Citation

If you find our work is useful for your research, please cite:

@inproceedings{jia2024audio,
  title={The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective},
  author={Jia, Wenqi and Liu, Miao and Jiang, Hao and Ananthabhotla, Ishwarya and Rehg, James M and Ithapu, Vamsi Krishna and Gao, Ruohan},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
demo		demo
model		model
params		params
.gitignore		.gitignore
README.md		README.md
avconv.yaml		avconv.yaml
dataloader.py		dataloader.py
requirements.txt		requirements.txt
test_net.py		test_net.py
train_net.py		train_net.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Audio-Visual Conversational Graph: from an Egocentric-Exocentric Perspective (CVPR 2024)

⚙️ Installation

📁 Data Preparation

🏋️ Training

🧪 Evaluation

⚠️ Disclaimer

📚 Citation

About

Releases

Packages

Languages

VJWQ/AV-CONV

Folders and files

Latest commit

History

Repository files navigation

The Audio-Visual Conversational Graph: from an Egocentric-Exocentric Perspective (CVPR 2024)

⚙️ Installation

📁 Data Preparation

🏋️ Training

🧪 Evaluation

⚠️ Disclaimer

📚 Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages