multimodal

Here are 592 public repositories matching this topic...

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

chatbot llama multimodal multi-modality gpt-4 foundation-models visual-language-learning chatgpt instruction-tuning vision-language-model llava llama2 llama-2

Updated Aug 12, 2024
Python

jina-ai / serve

Star

☁️ Build multimodal AI applications with cloud-native stack

Updated Feb 20, 2025
Python

microsoft / unilm

Star

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Updated Feb 6, 2025
Python

deepseek-ai / Janus

Star

Janus-Series: Unified Multimodal Understanding and Generation Models

multimodal unified-model any-to-any foundation-models llm vision-language-pretraining

Updated Feb 1, 2025
Python

NVIDIA / NeMo

Star

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

machine-translation tts speech-synthesis neural-networks deeplearning speaker-recognition asr multimodal speech-translation large-language-models speaker-diariazation generative-ai

Updated Feb 22, 2025
Python

bentoml / BentoML

Star

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering ai-inference llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Feb 21, 2025
Python

Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).

agent deploy llama lora embedding liger peft multimodal sft distill rft llm internvl qwen2-vl qwen2-5 llama3-3 deepseek-r1 grpo open-r1

Updated Feb 22, 2025
Python

SkalskiP / courses

Sponsor

Star

This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)

nlp machine-learning natural-language-processing tutorial deep-neural-networks computer-vision deep-learning transformers generative-model multimodal mlops stable-diffusion

Updated Apr 22, 2024
Python

facebookresearch / mmf

Star

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

deep-learning dialog pytorch vqa pretrained-models captioning multimodal multi-tasking textvqa hateful-memes

Updated Feb 3, 2025
Python

livekit / agents

Star

Build real-time multimodal AI applications 🤖🎙️📹

real-time video ai voice agents voice-assistant multimodal

Updated Feb 22, 2025
Python

TEN-framework / TEN-Agent

Star

TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compatible with popular workflow platforms like Dify and Coze.

Updated Feb 20, 2025
Python

kyegomez / swarms

Sponsor

Star

The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. Website: https://swarms.ai

Updated Feb 20, 2025
Python

kyegomez / tree-of-thoughts

Sponsor

Star

Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%

deep-learning prompt artificial-intelligence multimodal gpt4 prompt-learning prompt-tuning prompt-engineering chatgpt

Updated Oct 29, 2024
Python

IDEA-CCNL / Fengshenbang-LM

Star

Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系，成为中文AIGC和认知智能的基础设施。

transformers pytorch chinese-nlp pretrained-models distributed-training multimodal aigc

Updated Aug 13, 2024
Python

rom1504 / img2dataset

Star

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

image big-data deep-learning dataset image-dataset download-images multimodal

Updated Aug 7, 2024
Python

jina-ai / discoart

Star

🪩 Create Disco Diffusion artworks in one line

generative-art cross-modal diffusion prompts creative-ai creative-art multimodal clip-guided-diffusion dalle disco-diffusion midjourney imgen discodiffusion latent-diffusion stable-diffusion

Updated May 16, 2023
Python

open-mmlab / mmpretrain

Star

OpenMMLab Pre-training Toolbox and Benchmark

deep-learning pytorch image-classification resnet pretrained-models clip mae mobilenet moco multimodal self-supervised-learning constrastive-learning beit vision-transformer swin-transformer masked-image-modeling convnext

Updated Nov 1, 2024
Python

X-PLUG / MobileAgent

Star

Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

android agent harmony ios app gui automation mobile copilot multimodal mobile-agents mllm multimodal-large-language-models gpt4v multimodal-agent

Updated Feb 21, 2025
Python

NExT-GPT / NExT-GPT

Star

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

multimodal gpt-4 foundation-models visual-language-learning large-language-models llm chatgpt instruction-tuning multi-modal-chatgpt

Updated Nov 3, 2024
Python

OpenGVLab / InternGPT

Star

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

Updated Aug 20, 2024
Python

Improve this page

Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal

Here are 592 public repositories matching this topic...

haotian-liu / LLaVA

jina-ai / serve

microsoft / unilm

deepseek-ai / Janus

NVIDIA / NeMo

bentoml / BentoML

modelscope / ms-swift

SkalskiP / courses

facebookresearch / mmf

livekit / agents

TEN-framework / TEN-Agent

kyegomez / swarms

kyegomez / tree-of-thoughts

IDEA-CCNL / Fengshenbang-LM

rom1504 / img2dataset

jina-ai / discoart

open-mmlab / mmpretrain

X-PLUG / MobileAgent

NExT-GPT / NExT-GPT

OpenGVLab / InternGPT

Improve this page

Add this topic to your repo