Zhuo Su 苏卓

I am a senior researcher/engineer at Bytedance. Previously, I served as a senior researcher at Tencent (Recruitment Talents Program: "技术大咖 "). Before that, I earned my Master's degree from Department of Automation, Tsinghua University, supervised by Prof. Qionghai Dai and Prof. Lu Fang, and meanwhile, I worked closely with Prof. Yebin Liu and Lan Xu.

My work mission is to capture and understand dynamic human-centric scenes in the real world, and digitalize humans, objects, and events for immersive application in virtual and augmented reality. My research primarily centers around computer vision and graphics, especially human 3D generation, avatar creation, 4D reconstruction, neural rendering, motion capture, and related areas.

I am looking for full-time partners and research interns, please feel free to drop me an email if you are interested in the topics above.

Email: suzhuo13@gmail.com | suzhuo@bytedance.com

Background | Research | Awards| Skills | Services | Google Scholar

Background

M.S., Department of Automation, Tsinghua University, Beijing, China. 2018.08-2021.06
GPA: 3.89/4.0 (GPA ranking: 5/137)
B.E., Department of Automation, Northeastern University, Shenyang, China. 2014.09-2018.06
GPA: 4.18/5.0 (GPA ranking: 5/276; Comprehensive ranking: 1/276)

Research

3D Generation | Avatar Creation | 4D Reconstruction| Neural Rendering | Motion Capture

1. 3D Generation

Joint2Human: High-quality 3D Human Generation via Compact Spherical Embedding of 3D Joints
Muxin Zhang, Qiao Feng, Zhuo Su, Chao Wen, Zhou Xue, Kun Li
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

We introduce Joint2Human, a novel method that leverages 2D diffusion models to generate detailed 3D human geometry directly, ensuring both global structure and local details.
[Paper] [Project page]

HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors
Panwang Pan*, Zhuo Su* (Project Lead), Chenguo Lin*, Zhen Fan, Yongjie Zhang, Zeming Li, Tingting Shen, Yadong Mu, Yebin Liu (*Equal Contribution)
arXiv, 2024.

We propose HumanSplat, a method that predicts the 3D Gaussian Splatting properties of a human from a single input image in a generalizable way. It utilizes a 2D multi-view diffusion model and a latent reconstruction transformer with human structure priors to effectively integrate geometric priors and semantic features.
[Paper] [Project page]

2. Avatar Creation

OHTA: One-shot Hand Avatar via Data-driven Implicit Priors
Xiaozheng Zheng, Chao Wen, Zhuo Su, Zeran Xu, Zhaohu Li, Yang Zhao, Zhou Xue
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

OHTA is a novel approach capable of creating implicit animatable hand avatars using just a single image. It facilitates 1) text-to-avatar conversion, 2) hand texture and geometry editing, and 3) interpolation and sampling within the latent space.
[Paper] [Project page]

HeadGAP: Few-shot 3D Head Avatar via Generalizable GAussian Priors
Xiaozheng Zheng, Chao Wen, Zhaohu Li, Weiyi Zhang, Zhuo Su, Xu Chang, Yang Zhao, Zheng Lv, Xiaoyuan Zhang, Yongjie Zhang, Guidong Wang, Lan Xu
arXiv, 2024.

We propose a 3D head avatar creation method that generalizes from few-shot in-the-wild data. By using 3D head priors from a large-scale dataset and a Gaussian Splatting-based network, our approach achieves high-fidelity rendering and robust animation.
[Paper] [Project page]

3. 4D Reconstruction

UnstructuredFusion: Realtime 4D Geometry and Texture Reconstruction using Commercial RGBD Cameras
Lan Xu, Zhuo Su, Lei Han, Tao Yu, Yebin Liu, Lu Fang
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019.

We propose UnstructuredFusion, which allows realtime, high-quality, complete reconstruction of 4D textured models of human performance via only three commercial RGBD cameras.
[Paper] [Project page]

RobustFusion: Human Volumetric Capture with Data-driven Visual Cues using a RGBD Camera
Zhuo Su, Lan Xu, Zerong Zheng, Tao Yu, Yebin Liu, Lu Fang
European Conference on Computer Vision (ECCV), 2020, Spotlight.

We introduce a robust human volumetric capture approach combined with various data-driven visual cues using a Kinect, which outperforms existing state-of-the-art approaches significantly.
[Paper] [Project page]

Robust Volumetric Performance Reconstruction under Human-object Interactions from Monocular RGBD Stream
Zhuo Su, Lan Xu, Dawei Zhong, Zhong Li, Fan Deng, Shuxue Quan, Lu Fang
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022.

We propose a robust volumetric performance reconstruction system for human-object interaction scenarios using only a single RGBD sensor, which combines various data-driven visual and interaction cues to handle the complex interaction patterns and severe occlusions.
[Paper] [Project page]

4. Neural Rendering

NeuralHOFusion: Neural Volumetric Rendering Under Human-Object Interactions
Yuheng Jiang, Suyi Jiang, Guoxing Sun, Zhuo Su, Kaiwen Guo, Minye Wu, Jingyi Yu, Lan Xu
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.

We propose a robust neural volumetric rendering method for human-object interaction scenarios using 6 RGBD cameras, which achieves layer-wise and photorealistic reconstruction results of human performance in novel views.
[Paper] [Project page]

Instant-NVR: Instant Neural Volumetric Rendering for Human-object Interactions from Monocular RGBD Stream
Yuheng Jiang, Kaixin Yao, Zhuo Su, Zhehao Shen, Haimin Luo, Lan Xu
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.

We propose a neural approach for instant volumetric human-object tracking and rendering using a single RGBD camera. It bridges traditional non-rigid tracking with recent instant radiance field techniques via a multi-thread tracking-rendering mechanism.
[Paper] [Project page]

HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting
Yuheng Jiang, Zhehao Shen, Penghao Wang, Zhuo Su, Yu Hong, Yingliang Zhang, Jingyi Yu, Lan Xu
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

we present an explicit and compact Gaussian-based approach for high-fidelity human performance rendering from dense footage, in which our core intuition is to marry the 3D Gaussian representation with non-rigid tracking.
[Paper] [Project page]

5. Motion Capture

Learning Variational Motion Prior for Video-based Motion Capture
Xin Chen*, Zhuo Su*, Lingbo Yang*, Pei Cheng, Lan Xu, Gang Yu (*Equal Contribution)
arXiv, 2022.

We propose a novel variational motion prior (VMP) learning approach for video-based motion capture. Specifically, VMP is implemented as a transformer-based variational autoencoder pretrained over large-scale 3D motion data, providing an expressive latent space for human motion at sequence level.
[Paper]

Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling
Xiaozheng Zheng*, Zhuo Su*, Chao Wen, Zhou Xue, Xiaojie Jin (*Equal Contribution)
IEEE/CVF International Conference on Computer Vision (ICCV), 2023.

We propose a two-stage framework that can obtain accurate and smooth full-body motions with the three tracking signals of head and hands only, in which we first explicitly model the joint-level features and then utilize them as spatiotemporal transformer tokens to capture joint-level correlations.
[Paper] [Project page]

HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations
Peng Dai, Yang Zhang, Tao Liu, Zhen Fan, Tianyuan Du, Zhuo Su, Xiaozheng Zheng, Zeming Li
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

We propose HMD-Poser, the first unified approach torecover full-body motions using scalable sparse observations from HMD and body-worn IMUs. In particular, it can support a variety of input scenarios, such as HMD,HMD+2IMUs, HMD+3IMUs, etc.
[Paper] [Project page]

EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs
Zhen Fan*, Peng Dai*, Zhuo Su*, Xu Gao, Zheng Lv, Jiarui Zhang, Tianyuan Du, Guidong Wang, Yang Zhang (*Equal Contribution)
arXiv, 2024.

We introduce EMHI, a dataset combining stereo images from headsets and IMU data for egocentric human motion capture in VR. It includes 28.5 hours of data from 58 subjects. We also propose MEPoser, a method that effectively uses this multimodal data for improved pose estimation.
[Paper] [Project page: Coming soon]

Patents & early publications

Lu Fang, Zhuo Su, Lei Han, Qionghai Dai, “Depth camera calibration method and device, electronic equipment and storage medium”, CN:201810179738:A
Lu Fang, Lei Han, Zhuo Su, Qionghai Dai, “A three-dimensional rebuilding method and device based on a depth camera, an apparatus and a storage medium”, CN:201810179264:A
Lu Fang, Zhuo Su, Lan Xu, “Dynamic three-dimensional reconstruction method, device, equipment, medium and system”, CN:201910110062:A
Lu Fang, Zhuo Su, Lan Xu, “Texture real-time determination method, device and equipment for dynamic scene and medium”, CN:201910110044:A
Lu Fang, Zhuo Su, Lan Xu, Jianwei Wen, Chao Yuan, “Dynamic human body three-dimensional reconstruction method, device, equipment and medium”, CN:202010838902:A
Lu Fang, Zhuo Su, Lan Xu, Jianwei Wen, Chao Yuan, “Dynamic human body three-dimensional model completion method and device, equipment and medium”, CN:202010838890:A
Zhuo Su, Xiaozhe Wang, Wen Fei, Changfu Zhou, “Multi-feature information landmark detection method for precise landing of unmanned aerial vehicle”, CCN:201710197369:A
Wen Fei, Zhuo Su* (*corresponding author), Changfu Zhou, “Artificial landmark design and detection using hierarchy information for UAV localization and landing”, Chinese Control And Decision Conference 2017 (CCDC 2017), [Paper]
Haina Wu, Zhuo Su, Kai Luo, Qi Wang, Xianzhong Cheng "Exploration and Research on the Movement of Magnus Glider”, Physical Experiment of College, 2015 (5): 2

Awards

Outstanding Graduate of Beijing, Beijing, 2021
Outstanding Graduate of Department of Automation, Tsinghua University, 2021
Excellent Bachelor Thesis Award, Northeastern University, 2018
Outstanding Graduate of Liaoning Province, Liaoning Province, 2018
National Scholarship, Ministry of Education, 2018
Excellence Award for National Undergraduate Innovation Program, Northeastern University, 2017
City's Excellent Undergraduate, Shenyang City, 2017
Mayor's Scholarship, Shenyang City, 2017
Top Ten Excellent Undergraduate (10 / the whole university, 十佳本科生), Northeastern University, 2017
Honorable Mention of American Mathematical Contest in Modeling, COMAP, 2017
Second Prize of National Undergraduate Mathematical Contest in Modeling, CSIAM, 2016
First Prize of Provincial Undergraduate Mathematical Contest in Modeling, Liaoning Province, 2016
2x Second Prize of Electronic Design Contest, Education Department of Liaoning Province, 2015-2016
4x First Class Scholarships, Northeastern University, 2015-2018

Skills

C & C++(OpenCV, OpenGL, CUDA, Eigen, ...), Python(Pytorch), Matlab, LaTeX, ...

Services

Reviewer for CVPR, NIPS, ICLR, IEEEVR, 3DV, ...

Zhuo Su 苏 卓