-
Tsinghua University
Highlights
- Pro
Stars
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
Official implementation of Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer
ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)
a state-of-the-art-level open visual language model | 多模态预训练模型
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
The official implementation of "Relay Diffusion: Unifying diffusion process across resolutions for image synthesis" [ICLR 2024 Spotlight]
Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022
[ICCV 2023 Oral] Text-to-Image Diffusion Models are Zero-Shot Video Generators
A collection of resources and papers on Diffusion Models
[ICCV 2023] A latent space for stochastic diffusion models
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.
FILM: Frame Interpolation for Large Motion, In ECCV 2022.
Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners
Download DeepMind's Kinetics dataset.
Taming Transformers for High-Resolution Image Synthesis
Text-to-Image generation. The repo for NeurIPS 2021 paper "CogView: Mastering Text-to-Image Generation via Transformers".
A collection of awesome resources in Human Pose estimation.