Shuai Cheng Li

Department of Computer Science
City University of Hong Kong
83 Tat Chee Avenue
Kowloon
Hong Kong

office: Room G6520, 6/F, AC1, Academic Building
phone: +852-3442-9412
email: shuaicli at cityu.edu.hk
homepage: http://www.cs.cityu.edu.hk/~shuaicli

Research Interests: Bioinformatics, algorithms, machine learning, omics data analysis.

I am recuriting PhD students. Applicants should have a master degree or finish a master degree by the summer of 2021. Oustanding undergraduate students will also be considered. If you are interested and you believe you are good, please may send me your CV.

You can find more details about our PhD program at http://www.sgs.cityu.edu.hk/prospective/rpg

You can also find information about Hong Kong PhD Fellowship Scheme, 2020/21 at http://www.sgs.cityu.edu.hk/prospective/rpg/hkphd

Selected Research Projects


FALCON		FALCON is a software system for protein structure prediction. It ranked among the top 3 protein structure prediction systems on hard targets in CASP8 (Assessment of Techniques for Protein Structure Prediction 8, http://robetta.bakerlab.org/CASP8_eval_domains/CASP8.FR_H.First-GDT_MM.html.) The system uses a simple position-specific hidden Markov model to predict protein structures. The new framework naturally repeats itself to converge to a final target, conglomerating fragment assembly, clustering, target selection, refinement, and consensus, all in one process. Our initial implementation of this system converged to within 6 Angstrom of the native structures for 100% decoys on all 6 standard benchmark proteins used to evaluate the state-of-the-art system called ROSETTA, which achieved only 14% to 94% on the same data. The qualities of the best decoys and the final decoys our system converged to were also notably better. Recently, we completed an automatic system for determining protein structures from NMR spectra. It usually takes well trained experts several months of experimentations to infer a structure from NMR data manually. Our system, AMR, completely automates this process, and reduces the time needed to infer high resolution structures from several months to one day. The system works in a three parts pipeline: peak picking, chemical shift assignment and structure generation. My work was on the structure generation part, which is an extension of FALCON to work with partial NMR constraints: to accept chemical shift information, tolerate errors and refine structures. Initial results show that our system managed to build high resolution structures that are comparable to those produced by human experts.

FRAZOR		FRAZOR utilizes a linear programming model for finding structural alphabet candidates for a target sequence. The 3D structure of a protein sequence can be assembled from substructures that correspond to small segments of the sequence. For each small sequence segment, there are only a few likely substructures. They are called the structural alphabet for the segment. Classical approaches such as ROSETTA used sequence profile and secondary structure information to predict structural alphabet. In contrast, we utilized more structural information, such as solvent accessibility and contact capacity, for finding structural alphabet. We used an integer linear programming technique to derive the best combination of these sequences and structural information. Using this additional information, we were able to generate significantly more accurate and succinct structural alphabets ? more than 50% improvement over the accuracies obtained previously by others. With these novel structural alphabets, we are able to construct more accurate protein structures than the state-of-the-art ab initio protein structure prediction programs such as ROSETTA. We are also able to reduce the Kolodny's library size by a factor of 8, at the same accuracy.