VSSD

VSSD: Vision Mamba with Non-Causal State Space Duality

Updates

July. 29th, 2024: When introduce MESA in training as MLLA, VSSD-B achieve 85.4% top-1 acc on ImageNet-1K !
July. 25th, 2024: We release the code, log and ckpt for VSSD

Introduction

Recently, State Space Duality (SSD), an improved variant of SSMs, was introduced in Mamba2 to enhance model performance and efficiency. However, the inherent causal nature of SSD/SSMs restricts their applications in non-causal vision tasks. To address this limitation, we introduce Visual State Space Duality (VSSD) model, which has a non-causal format of SSD. This repository contains the code for training and evaluating VSSD varints on the ImageNet-1K dataset for image classification, COCO dataset for object detection, and ADE20K dataset for semantic segmentation. For more information, please refer to our paper.

Main Results

Classification on ImageNet-1K

name	pretrain	resolution	acc@1	#params	FLOPs	logs&ckpts
VSSD-Micro	ImageNet-1K	224x224	82.5	14M	2.3G	log&ckpt
VSSD-Tiny	ImageNet-1K	224x224	83.6	24M	4.5G	log&ckpt
VSSD-Small	ImageNet-1K	224x224	84.1	40M	7.4G	log&ckpt
VSSD-Base	ImageNet-1K	224x224	84.7	89M	16.1G	log&ckpt

Enhanced model with MESA:

name	pretrain	resolution	acc@1	#params	FLOPs	logs&ckpts
VSSD-Small	ImageNet-1K	224x224	84.5	40M	7.4G	coming
VSSD-Base	ImageNet-1K	224x224	85.4	89M	16.1G	coming

Object Detection on COCO

Backbone	#params	FLOPs	Detector	box mAP	mask mAP	logs&ckpts
VSSD-Micro	33M	220G	MaskRCNN@1x	45.4	41.3	log&ckpt
VSSD-Tiny	44M	265G	MaskRCNN@1x	46.9	42.6	log&ckpt
VSSD-Small	59M	325G	MaskRCNN@1x	48.4	43.5	log&ckpt
VSSD-Micro	33M	220G	MaskRCNN@3x	47.7	42.8	log&ckpt
VSSD-Tiny	44M	265G	MaskRCNN@3x	48.8	43.6	log&ckpt

Semantic Segmentation on ADE20K

Backbone	Input	#params	FLOPs	Segmentor	mIoU(SS)	mIoU(MS)	logs&ckpts
VSSD-Micro	512x512	42M	893G	UperNet@160k	45.6	46.0	log&ckpt
VSSD-Tiny	512x512	53M	941G	UperNet@160k	47.9	48.7	log&ckpt

Getting Started

Installation

Step 1: Clone the VSSD repository:

git clone https://github.com/YuHengsss/VSSD.git
cd VSSD

Step 2: Environment Setup:

Create and activate a new conda environment

conda create -n VSSD
conda activate VSSD

Install Dependencies

pip install -r requirements.txt

Dependencies for Detection and Segmentation (optional)

pip install mmengine==0.10.1 mmcv==2.1.0 opencv-python-headless ftfy regex
pip install mmdet==3.3.0 mmsegmentation==1.2.2 mmpretrain==1.2.0

Quick Start

Classification

To train VSSD models for classification on ImageNet, use the following commands for different configurations:

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=8 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp

If you only want to test the performance (together with params and flops):

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp --resume </path/of/checkpoint> --eval

Detection and Segmentation

To evaluate with mmdetection or mmsegmentation:

bash ./tools/dist_test.sh </path/to/config> </path/to/checkpoint> 1

use --tta to get the mIoU(ms) in segmentation

To train with mmdetection or mmsegmentation:

bash ./tools/dist_train.sh </path/to/config> 8

Citation

If VSSD is helpful for your research, please cite the following paper:

@article{shi2024vssd,
         title={VSSD: Vision Mamba with Non-Casual State Space Duality}, 
         author={Yuheng Shi and Minjing Dong and Mingjia Li and Chang Xu},
         journal={arXiv preprint arXiv:2407.18559},
         year={2024}
}

Acknowledgment

This project is based on VMamba(paper, code), Mambav2 (paper, code), Swin-Transformer (paper, code), OpenMMLab, thanks for their excellent works.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.idea		.idea
assets		assets
classification		classification
detection		detection
segmentation		segmentation
.gitignore		.gitignore
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VSSD

VSSD: Vision Mamba with Non-Causal State Space Duality

Updates

Introduction

Main Results

Classification on ImageNet-1K

Object Detection on COCO

Semantic Segmentation on ADE20K

Getting Started

Installation

Quick Start

Citation

Acknowledgment

About

Releases

Packages

Languages

YuHengsss/VSSD

Folders and files

Latest commit

History

Repository files navigation

VSSD

VSSD: Vision Mamba with Non-Causal State Space Duality

Updates

Introduction

Main Results

Classification on ImageNet-1K

Object Detection on COCO

Semantic Segmentation on ADE20K

Getting Started

Installation

Quick Start

Citation

Acknowledgment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages