vision-language-transformer

Here are 18 public repositories matching this topic...

salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

deep-learning salesforce image-captioning deep-learning-library vision-framework vision-and-language multimodal-deep-learning multimodal-datasets vision-language-transformer vision-language-pretraining visual-question-anwsering

Updated Nov 18, 2024
Jupyter Notebook

IDEA-Research / GroundingDINO

Star

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

open-world object-detection vision-language vision-language-transformer open-world-detection

Updated Aug 12, 2024
Python

salesforce / BLIP

Star

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

image-captioning visual-reasoning visual-question-answering vision-language vision-language-transformer image-text-retrieval vision-and-language-pre-training

Updated Aug 5, 2024
Jupyter Notebook

AlibabaResearch / AdvancedLiterateMachinery

Star

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

ocr computer-vision artificial-intelligence text-recognition document text-detection document-analysis end-to-end-ocr multimodal scene-text-recognition multimodal-deep-learning scene-text-detection vision-language document-understanding scene-text-detection-recognition document-recognition document-intelligence documentai vision-language-transformer vision-language-model

Updated Nov 22, 2024
C++

henghuiding / ReLA

Star

[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation

multimodal-learning referring-image-segmentation referring-expression-segmentation referring-expression-comprehension vision-language-transformer cvpr2023

Updated Sep 5, 2023
Python

shenyunhang / APE

Star

[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception

open-world object-detection image-segmentation referring-expression-comprehension vision-language-transformer

Updated May 8, 2024
Python

henghuiding / Vision-Language-Transformer

Star

[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation

tensorflow keras transformer vision-language referring-segmentation tpami iccv2021 vision-language-transformer

Updated Jan 7, 2022
Python

sdc17 / UPop

Star

[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.

framework sparsity image-captioning pruning structured model-compression visual-reasoning multimodal-learning visual-question-answering weight-pruning efficient-deep-learning vision-transformer vision-language-transformer image-text-retrieval text-image-retrieval

Updated Nov 4, 2023
Python

haoliuhl / instructrl

Star

Instruction Following Agents with Multimodal Transforemrs

machine-learning reinforcement-learning instructions transformer flax jax instruction-following vision-language-transformer

Updated Nov 3, 2022
Python

sMamooler / CLIP_Explainability

Star

code for studying OpenAI's CLIP explainability

machine-learning computer-vision gradcam-visualization model-explainability openai-clip vision-language-transformer

Updated Jan 7, 2022
Jupyter Notebook

sdc17 / CrossGET

Star

[ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.

framework transformer image-captioning visual-reasoning multimodal-learning visual-question-answering model-acceleration efficient-deep-learning vision-language-transformer image-text-retrieval text-image-retrieval token-ensemble token-matching

Updated Oct 4, 2023

yiren-jian / BLIText

Star

[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

multimodal-deep-learning vision-language-transformer vision-language-pretraining

Updated Dec 5, 2023
Python

unitaryai / VTC

Star

VTC: Improving Video-Text Retrieval with User Comments

comments video-understanding multimodal-deep-learning video-text-retrieval vision-language-transformer vision-language-pretraining

Updated Nov 3, 2024
Python

marialymperaiou / knowledge-enhanced-multimodal-learning

Star

A list of research papers on knowledge-enhanced multimodal learning

knowledge-graph multi-task-learning visual-reasoning visual-dialog visual-question-answering vision-and-language multimodal-deep-learning visual-storytelling multimodal-retrieval visual-grounding visual-commonsense-reasoning vision-and-language-navigation story-visualization image-text-matching vision-language-transformer image-text-retrieval vision-and-language-pre-training conditional-image-generation knowledge-enhanced-multimodal-learning knowledge-enhanced-vision-language

Updated Dec 8, 2022

ThomasVonWu / Awesome-VLMs-Strawberry

Star

A collection of VLMs papers, blogs, and projects, with a focus on VLMs in Autonomous Driving and related reasoning techniques.

multimodal-learning vision-language-transformer llm vlms

Updated Nov 16, 2024

aurooj / VLM_SS

Star

Mini-batch selective sampling for knowledge adaption of VLMs for mammography.

medical-imaging miccai mammogram multimodal-learning vision-and-language multimodal-retrieval vision-language-transformer multimodal-representation-learning miccai2024 medical-vision-language-model minibatch-selective-sampling

Updated Oct 7, 2024
Jupyter Notebook

PrateekJannu / Vision-GPT

Star

Coding a Multi-Modal vision model like GPT-4o from scratch, inspired by @hkproj and PaliGemma

open-source machine-learning google artificial-intelligence gemini transformer-architecture transformer-models vision-transformer vision-language-transformer large-language-models llm large-language-model vision-language-model gpt-4o

Updated Nov 17, 2024
Python

atharva-naik / MMML-TermProject-VizWiz-VQA-Challenge

Star

VizWiz Challenge Term Project for Multi Modal Machine Learning @ CMU (11777)

open-source opencv natural-language-processing computer-vision image-processing pytorch question-answering open-source-project carnegie-mellon-university term-project visual-question-answering vizwiz vision-language vision-language-transformer vizwiz-vqa

Updated Sep 13, 2023
Python

Improve this page

Add a description, image, and links to the vision-language-transformer topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-language-transformer topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vision-language-transformer

Here are 18 public repositories matching this topic...

salesforce / LAVIS

IDEA-Research / GroundingDINO

salesforce / BLIP

AlibabaResearch / AdvancedLiterateMachinery

henghuiding / ReLA

shenyunhang / APE

henghuiding / Vision-Language-Transformer

sdc17 / UPop

haoliuhl / instructrl

sMamooler / CLIP_Explainability

sdc17 / CrossGET

yiren-jian / BLIText

unitaryai / VTC

marialymperaiou / knowledge-enhanced-multimodal-learning

ThomasVonWu / Awesome-VLMs-Strawberry

aurooj / VLM_SS

PrateekJannu / Vision-GPT

atharva-naik / MMML-TermProject-VizWiz-VQA-Challenge

Improve this page

Add this topic to your repo