
- Shanghai
-
05:49
(UTC +08:00)
Starred repositories
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
FlashInfer: Kernel Library for LLM Serving
Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory! 🦥
A lightweight data processing framework built on DuckDB and 3FS.
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
Analyze computation-communication overlap in V3/R1.
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
DeepEP: an efficient expert-parallel communication library
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Accessible large language models via k-bit quantization for PyTorch.
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.
Fast and vectorizable algorithms for searching in a vector of sorted floating point numbers
PyTorch native quantization and sparsity for training and inference
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Official inference repo for FLUX.1 models
Efficient Triton Kernels for LLM Training
GGUF Quantization support for native ComfyUI models
Scalable RL solution for advanced reasoning of language models
[SIGMOD 2024] RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search
The Hugging Face Course on Transformers for Audio
This repository contains the Hugging Face Agents Course.