hhy3

zh Wang hhy3

48 followers · 31 following

@zilliztech
Shanghai
05:49 (UTC +08:00)

Achievements

Organizations

Starred repositories

Oneflow-Inc / oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 7,261 796 Updated Mar 5, 2025

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 3,909 397 Updated Feb 9, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 2,296 238 Updated Mar 6, 2025

unslothai / unsloth

Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory! 🦥

Python 33,661 2,355 Updated Mar 6, 2025

deepseek-ai / smallpond

A lightweight data processing framework built on DuckDB and 3FS.

Python 3,833 311 Updated Mar 5, 2025

deepseek-ai / 3FS

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 7,528 647 Updated Mar 6, 2025

rigtorp / ipc-bench

Latency benchmarks of Unix IPC mechanisms

C 567 167 Updated Sep 25, 2023

deepseek-ai / EPLB

Expert Parallelism Load Balancer

Python 1,018 146 Updated Feb 27, 2025

deepseek-ai / profile-data

Analyze computation-communication overlap in V3/R1.

886 113 Updated Mar 3, 2025

deepseek-ai / DualPipe

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,504 239 Updated Mar 5, 2025

anthropics / claude-code

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…

Shell 5,782 280 Updated Mar 6, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 4,798 464 Updated Mar 5, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 7,023 599 Updated Mar 6, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA decoding kernels

C++ 11,160 774 Updated Mar 1, 2025

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

6,631 186 Updated Mar 4, 2025

bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

Python 6,762 670 Updated Mar 4, 2025

huggingface / optimum-benchmark

🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.

Python 289 56 Updated Jan 31, 2025