Stars
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
[ICLR 2024 Spotlight] Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments
official code for Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
[CVPR 2023] RILS: Masked Visual Reconstruction in Language Semantic Space (https://arxiv.org/abs/2301.06958)
Bottom-up Object Detection by Grouping Extreme and Center Points
Multi-Scale Spatio-Temporal Attention based Video Instance Segmentation
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.
Real-time Object Detection for Streaming Perception, CVPR 2022
[Under preparation] Code repo for "Open-Vocabulary DETR with Conditional Matching" (ECCV 2022)
BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training
[CVPR 2022] SparseInst: Sparse Instance Activation for Real-Time Instance Segmentation
Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral
Official Implementation of DE-DETR and DELA-DETR in "Towards Data-Efficient Detection Transformers"
Best Practices, code samples, and documentation for Computer Vision.
EDTER: Edge Detection with Transformer, in CVPR 2022
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
Code for "Deep Snake for Real-Time Instance Segmentation" CVPR 2020 oral
A general and accurate MACs / FLOPs profiler for PyTorch models
Official MegEngine implementation of RepLKNet
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs (CVPR 2022)
[ICLR 2022] Official implementation of the paper "DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR"
[TPAMI 2024 & CVPR 2022] Attention Concatenation Volume for Accurate and Efficient Stereo Matching
(AAAI 2023 Oral) Pytorch implementation of "CF-ViT: A General Coarse-to-Fine Method for Vision Transformer"
[ECCV2022] This is an official implementation of paper "RankSeg: Adaptive Pixel Classification with Image Category Ranking for Segmentation".