LessUp

Follow

Lessup LessUp

Follow

16 followers · 79 following

shenzhen
01:12 (UTC +08:00)

Achievements

Achievements

LessUp/README.md

专注于 AI 基础设施与高性能计算的工程实践

🔬 Focus: CUDA Kernel Optimization · LLM Inference · GPU Computing
🌱 Exploring: AI inference acceleration & large-scale pipeline orchestration
🤝 Open to: Research collaboration & open-source co-building

👨‍💻 About Me / 关于我

I build AI infrastructure and high-performance systems with C++/CUDA, Python, and Go.
主要关注 AI 基础设施、GPU 算子优化与高性能计算等方向的工程实践。

🔥 GPU Kernel Engineering — CUDA/Triton kernel optimization, FlashAttention, GEMM, quantization
🧠 AI Inference Systems — LLM inference engines, model quantization (W8A16/FP8), KV Cache
⚡ High-Performance Computing — N-body simulation, ray tracing, image processing pipelines
🌐 Real-time Systems — WebRTC signaling, real-time detection, digital human platform

🚀 Projects / 项目全景

⚡ GPU Kernel Optimization / GPU 算子优化

🔷 TensorCraft-HPC

Modern C++17/CUDA AI kernel library — Elementwise, GEMM, FlashAttention, Conv2D, SpMV, FP8 quantization

🔷 SGEMM Optimization

From naive 3-loop to Tensor Core — progressive SGEMM optimization reaching 40% cuBLAS

🔷 Triton Fused Ops

RMSNorm+RoPE fusion, Gated MLP fusion, FP8 GEMM with auto-tuning for Transformers

🔷 LLM-Speed

CUDA LLM kernel library — FlashAttention (online softmax), FP16/INT8 GEMM with Tensor Core

🧠 AI Inference Engines / AI 推理引擎

🟢 Tiny-LLM

Lightweight LLM inference engine — W8A16 quantization, KV Cache, multi-sampling strategies

🟢 Mini Inference Engine

7-level GEMM optimization (Naive→Tensor Core), reaching 72% cuBLAS with MNIST demo

🟢 Tiny-DL-Inference

WebGPU micro inference engine — Conv2d, kernel fusion, Im2Col, MNIST classification

🟢 YOLO-Toys

Multi-model real-time vision — YOLO/DETR/OWL-ViT/BLIP with WebSocket streaming

🎮 GPU Computing & Simulation / GPU 计算与仿真

🟠 CUDA Ray Tracer

Pure CUDA ray tracer — Phong shading, path tracing, BVH acceleration, warp divergence optimization

🟠 N-Body Simulation

Million-particle GPU simulation — Direct N², Barnes-Hut, Spatial Hash with CUDA-OpenGL interop

🟠 Particle Fluid Sim

10K-particle real-time fluid simulation using WebGPU compute shaders with trail effects

🟠 Mini-OpenCV

CUDA image processing library — convolution, morphology, geometric transforms, pipeline processing

🟠 Mini-ImagePipe

DAG-based heterogeneous image pipeline — multi-stream scheduling, pinned memory pool

🌐 Applications / 应用项目

🟣 MetaHuman

3D digital human platform — Three.js rendering, voice interaction, behavior control, emotion FSM

🟣 WebRTC

Minimal WebRTC demo — Go WebSocket signaling, room management, peer-to-peer media

🟣 Note Sync Now

E2E encrypted note sync — AES-256, 12-word mnemonic, real-time collaboration via WebSocket

🟣 Mind Gym

Browser-based memory training — N-back, spaced reinforcement, adaptive difficulty, PWA

🎓 Education & Experience / 教育与经历

🎓 Education

Xidian University

Background in communications engineering. / 通信与信息工程相关背景

💼 Experience

Mindray · ZEGO · BGI

Medical imaging, RTC & Genomic data. / 医疗、音视频与基因数据工程

🛠️ Tech Stack / 技术栈

Category	Technologies
Languages
AI & HPC	CUDA · Triton · cuBLAS · Tensor Core · WebGPU
System & DevOps
Web & Frontend

📊 GitHub Stats / 数据概览

📈 More Stats / 更多数据

📫 Connect with me / 联系方式

Feel free to reach out for collaboration, technical discussions, or open-source ideas.
欢迎通过邮箱与我交流技术想法、合作机会或开源项目。

Pinned Loading

bookmarks-cleaner bookmarks-cleaner Public

Smart Bookmark Cleanup & Classification: Rules + ML + Optional LLM, Dedup & Multi-Format Export (Python CLI) | 智能书签清理与分类工具：规则 + ML + 可选 LLM，去重、标题清理、多格式导出（Python CLI）

Python 2
awesome-cursorrules-zh awesome-cursorrules-zh Public

💻✨专为中文开发者优化的 Cursor AI 编程规则集合

123 17
meta-human meta-human Public

AI Digital Human Platform: Speech Recognition + LLM Chat + TTS + 3D Avatar (Next.js + Three.js) | AI 数字人交互平台：语音识别 + LLM 对话 + 语音合成 + 3D 虚拟形象（Next.js + Three.js）

TypeScript 7 2
webrtc webrtc Public

Minimal WebRTC Demo (Go + Pion): WebSocket Signaling, Browser P2P Audio/Video Calls & Docker Deployment | 最小可用 WebRTC 示例（Go + Pion）：WebSocket 信令、浏览器端点对点音视频通话、Docker 部署

JavaScript 1
yolo-toys yolo-toys Public

Multi-Model Real-Time Vision System: YOLO/DETR/OWL-ViT/BLIP Dynamic Switching, FastAPI + WebSocket Inference | 多模型实时视觉识别系统：YOLO/DETR/OWL-ViT/BLIP 动态切换，FastAPI + WebSocket 实时推理

Python 1