Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
View hhy3's full-sized avatar

Organizations

@milvus-io

Block or report hhy3

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 7,261 796 Updated Mar 5, 2025

Material for gpu-mode lectures

Jupyter Notebook 3,909 397 Updated Feb 9, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 2,296 238 Updated Mar 6, 2025

Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory! 🦥

Python 33,661 2,355 Updated Mar 6, 2025

A lightweight data processing framework built on DuckDB and 3FS.

Python 3,833 311 Updated Mar 5, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 7,528 647 Updated Mar 6, 2025

Latency benchmarks of Unix IPC mechanisms

C 567 167 Updated Sep 25, 2023

Expert Parallelism Load Balancer

Python 1,018 146 Updated Feb 27, 2025

Analyze computation-communication overlap in V3/R1.

886 113 Updated Mar 3, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,504 239 Updated Mar 5, 2025

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…

Shell 5,782 280 Updated Mar 6, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 4,798 464 Updated Mar 5, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 7,023 599 Updated Mar 6, 2025

FlashMLA: Efficient MLA decoding kernels

C++ 11,160 774 Updated Mar 1, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

6,631 186 Updated Mar 4, 2025

Accessible large language models via k-bit quantization for PyTorch.

Python 6,762 670 Updated Mar 4, 2025

🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.

Python 289 56 Updated Jan 31, 2025

Fast and vectorizable algorithms for searching in a vector of sorted floating point numbers

C++ 130 14 Updated Dec 19, 2024

PyTorch native quantization and sparsity for training and inference

Python 1,883 227 Updated Mar 6, 2025

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 1,057 93 Updated Mar 6, 2025

Official inference repo for FLUX.1 models

Python 20,617 1,453 Updated Feb 6, 2025

minimal-cost for training 0.5B R1-Zero

Python 598 79 Updated Feb 26, 2025

Efficient Triton Kernels for LLM Training

Python 4,563 276 Updated Mar 6, 2025

GGUF Quantization support for native ComfyUI models

Python 1,598 104 Updated Mar 5, 2025

Scalable RL solution for advanced reasoning of language models

Python 1,363 82 Updated Feb 19, 2025

LLM Frontend for Power Users.

JavaScript 12,069 2,896 Updated Mar 6, 2025

[SIGMOD 2024] RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search

C++ 79 12 Updated Jan 22, 2025

The Hugging Face Course on Transformers for Audio

MDX 387 108 Updated Feb 21, 2025

This repository contains the Hugging Face Agents Course.

Jupyter Notebook 13,817 835 Updated Mar 5, 2025
Next