Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
View LessUp's full-sized avatar
  • shenzhen
  • 01:12 (UTC +08:00)

Block or report LessUp

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
LessUp/README.md
Header
Typing SVG

专注于 AI 基础设施与高性能计算的工程实践

🔬 Focus: CUDA Kernel Optimization · LLM Inference · GPU Computing
🌱 Exploring: AI inference acceleration & large-scale pipeline orchestration
🤝 Open to: Research collaboration & open-source co-building


Followers   Stars   Views



About  Projects  Experience  Tech Stack  Stats  Contact



👨‍💻 About Me / 关于我

Top Languages

I build AI infrastructure and high-performance systems with C++/CUDA, Python, and Go.
主要关注 AI 基础设施、GPU 算子优化与高性能计算等方向的工程实践。

  • 🔥 GPU Kernel Engineering — CUDA/Triton kernel optimization, FlashAttention, GEMM, quantization
  • 🧠 AI Inference Systems — LLM inference engines, model quantization (W8A16/FP8), KV Cache
  • High-Performance Computing — N-body simulation, ray tracing, image processing pipelines
  • 🌐 Real-time Systems — WebRTC signaling, real-time detection, digital human platform


🚀 Projects / 项目全景

⚡ GPU Kernel Optimization / GPU 算子优化

Modern C++17/CUDA AI kernel library — Elementwise, GEMM, FlashAttention, Conv2D, SpMV, FP8 quantization

C++17 CUDA Tensor Core

From naive 3-loop to Tensor Core — progressive SGEMM optimization reaching 40% cuBLAS

CUDA WMMA Roofline

RMSNorm+RoPE fusion, Gated MLP fusion, FP8 GEMM with auto-tuning for Transformers

Triton FP8 Python

CUDA LLM kernel library — FlashAttention (online softmax), FP16/INT8 GEMM with Tensor Core

CUDA PyTorch FlashAttention

🧠 AI Inference Engines / AI 推理引擎

Lightweight LLM inference engine — W8A16 quantization, KV Cache, multi-sampling strategies

CUDA C++17 INT8

7-level GEMM optimization (Naive→Tensor Core), reaching 72% cuBLAS with MNIST demo

CUDA C++17 FP16

WebGPU micro inference engine — Conv2d, kernel fusion, Im2Col, MNIST classification

WebGPU TypeScript WGSL

Multi-model real-time vision — YOLO/DETR/OWL-ViT/BLIP with WebSocket streaming

FastAPI YOLOv8 Docker

🎮 GPU Computing & Simulation / GPU 计算与仿真

Pure CUDA ray tracer — Phong shading, path tracing, BVH acceleration, warp divergence optimization

CUDA Path Tracing BVH

Million-particle GPU simulation — Direct N², Barnes-Hut, Spatial Hash with CUDA-OpenGL interop

CUDA OpenGL Barnes-Hut

10K-particle real-time fluid simulation using WebGPU compute shaders with trail effects

WebGPU TypeScript WGSL

CUDA image processing library — convolution, morphology, geometric transforms, pipeline processing

CUDA C++17 Image Processing

DAG-based heterogeneous image pipeline — multi-stream scheduling, pinned memory pool

CUDA C++17 DAG

🌐 Applications / 应用项目

3D digital human platform — Three.js rendering, voice interaction, behavior control, emotion FSM

React Three.js TypeScript

Minimal WebRTC demo — Go WebSocket signaling, room management, peer-to-peer media

Go WebRTC Docker

E2E encrypted note sync — AES-256, 12-word mnemonic, real-time collaboration via WebSocket

React Express Socket.IO

Browser-based memory training — N-back, spaced reinforcement, adaptive difficulty, PWA

JavaScript Tailwind PWA


🎓 Education & Experience / 教育与经历

🎓 Education

Xidian University Xidian University

Background in communications engineering. / 通信与信息工程相关背景

💼 Experience

Mindray Mindray · ZEGO ZEGO · BGI BGI

Medical imaging, RTC & Genomic data. / 医疗、音视频与基因数据工程


🛠️ Tech Stack / 技术栈

Category Technologies
Languages Languages
AI & HPC AI   CUDA · Triton · cuBLAS · Tensor Core · WebGPU
System & DevOps System
Web & Frontend Web

📊 GitHub Stats / 数据概览

LessUp's GitHub stats   GitHub Streak


📈 More Stats / 更多数据
GitHub Activity Graph

Snake animation

📫 Connect with me / 联系方式

Feel free to reach out for collaboration, technical discussions, or open-source ideas.
欢迎通过邮箱与我交流技术想法、合作机会或开源项目。

Email   GitHub

Footer

Pinned Loading

  1. bookmarks-cleaner bookmarks-cleaner Public

    Smart Bookmark Cleanup & Classification: Rules + ML + Optional LLM, Dedup & Multi-Format Export (Python CLI) | 智能书签清理与分类工具:规则 + ML + 可选 LLM,去重、标题清理、多格式导出(Python CLI)

    Python 2

  2. awesome-cursorrules-zh awesome-cursorrules-zh Public

    💻✨专为中文开发者优化的 Cursor AI 编程规则集合

    123 17

  3. meta-human meta-human Public

    AI Digital Human Platform: Speech Recognition + LLM Chat + TTS + 3D Avatar (Next.js + Three.js) | AI 数字人交互平台:语音识别 + LLM 对话 + 语音合成 + 3D 虚拟形象(Next.js + Three.js)

    TypeScript 7 2

  4. webrtc webrtc Public

    Minimal WebRTC Demo (Go + Pion): WebSocket Signaling, Browser P2P Audio/Video Calls & Docker Deployment | 最小可用 WebRTC 示例(Go + Pion):WebSocket 信令、浏览器端点对点音视频通话、Docker 部署

    JavaScript 1

  5. yolo-toys yolo-toys Public

    Multi-Model Real-Time Vision System: YOLO/DETR/OWL-ViT/BLIP Dynamic Switching, FastAPI + WebSocket Inference | 多模型实时视觉识别系统:YOLO/DETR/OWL-ViT/BLIP 动态切换,FastAPI + WebSocket 实时推理

    Python 1