I am a postdoc at Weill Cornell Medicine, Cornell University. Prior to joining this, I worked as a postdoc in Computer Science at the University of Illinois at Chicago. I have a broad research interest in machine learning, data mining, and biomedical informatics, particularly with tensor analysis. Supervisors: Postdoctoral Associate
The assessment of Alzheimer's Disease (AD) and Mild Cognitive Impairment (MCI) associated wit... more The assessment of Alzheimer's Disease (AD) and Mild Cognitive Impairment (MCI) associated with brain changes remains a challenging task. Recent studies have demonstrated that combination of multi-modality imaging techniques can better reflect pathological characteristics and contribute to more accurate diagnosis of AD and MCI. In this paper, we propose a novel tensor-based multi-modality feature selection and regression method for diagnosis and biomarker identification of AD and MCI from normal controls. Specifically, we leverage the tensor structure to exploit high-level correlation information inherent in the multimodality data, and investigate tensor-level sparsity in the multilinear regression model. We present the practical advantages of our method for the analysis of ADNI data using three imaging modalities (VBM-MRI, FDG-PET and AV45-PET) with clinical parameters of disease severity and cognitive scores. The experimental results demonstrate the superior performance of our ...
This paper presents HBcompare, a method that classifies protein structures according to ligand bi... more This paper presents HBcompare, a method that classifies protein structures according to ligand binding preference categories by analyzing hydrogen bond topology. HBcompare excludes other characteristics of protein structure so that, in the event of accurate classification, it can implicate the involvement of hydrogen bonds in selective binding. This approach contrasts from methods that represent many aspects of protein structure because holistic representations cannot associate classification with just one characteristic. To our knowledge, HBcompare is the first technique with this capability. On five datasets of proteins that catalyze similar reactions with different preferred ligands, HBcompare correctly categorized proteins with similar ligand binding preferences 89.5% of the time. Using only hydrogen bond topology, classification accuracy with HBcompare surpassed standard structure-based comparison algorithms that use atomic coordinates. As a tool for implicating the role of hyd...
ACM Transactions on Intelligent Systems and Technology
Along with the rapid expansion of information technology and digitalization of health data, there... more Along with the rapid expansion of information technology and digitalization of health data, there is an increasing concern on maintaining data privacy while garnering the benefits in the medical field. Two critical challenges are identified: First, medical data is naturally distributed across multiple local sites, making it difficult to collectively train machine learning models without data leakage. Second, in medical applications, data are often collected from different sources and views, resulting in heterogeneity and complexity that requires reconciliation. In this article, we present a generic Federated Multi-view Learning (FedMV) framework for multi-view data leakage prevention. Specifically, we apply this framework to two types of problems based on local data availability: Vertical Federated Multi-view Learning (V-FedMV) and Horizontal Federated Multi-view Learning (H-FedMV). We experimented with real-world keyboard data collected from BiAffect study. Our results demonstrated...
With recent advances in data collection from multiple sources, multi-view data has received signi... more With recent advances in data collection from multiple sources, multi-view data has received significant attention. In multi-view data, each view represents a different perspective of data. Since label information is often expensive to acquire, multiview clustering has gained growing interest, which aims to obtain better clustering solution by exploiting complementary and consistent information across all views rather than only using an individual view. Due to inevitable sensor failures, data in each view may contain error. Error often exhibits as noise or feature-specific corruptions or outliers. Multi-view data may contain any or combination of these error types. Blindly clustering multi-view data i.e., without considering possible error in view(s) could significantly degrade the performance. The goal of error-robust multi-view clustering is to obtain useful outcome even if the multi-view data is corrupted. Existing error-robust multi-view clustering approaches with explicit error ...
Ensuring the privacy of sensitive data used to train modern machine learning models is of paramou... more Ensuring the privacy of sensitive data used to train modern machine learning models is of paramount importance in many areas of practice. One recent popular approach to study these concerns is using the differential privacy via a "teacher-student" model, wherein the teacher provides the student with useful, but noisy, information, hopefully allowing the student model to perform well on a given task. However, these studies only solve the privacy concerns of the teacher by assuming the student owns a public but unlabelled dataset. In real life, the student also has privacy concerns on its unlabelled data, so as to inquire about privacy protection on any data sent to the teacher. In this work, we re-design the privacy-preserving "teacher-student" model consisting of adopting both private arbitrary masking and local differential privacy, which protects the sensitive information of each student sample. However, the traditional training of teacher model is not robust o...
Depression is one of the most common mental illness problems, and the symptoms shown by patients ... more Depression is one of the most common mental illness problems, and the symptoms shown by patients are not consistent, making it difficult to diagnose in the process of clinical practice and pathological research. Although researchers hope that artificial intelligence can contribute to the diagnosis and treatment of depression, the traditional centralized machine learning needs to aggregate patient data, and the data privacy of patients with mental illness needs to be strictly confidential, which hinders machine learning algorithms clinical application. To solve the problem of privacy of the medical history of patients with depression, we implement federated learning to analyze and diagnose depression. First, we propose a general multi-view federated learning framework using multi-source data,which can extend any traditional machine learning model to support federated learning across different institutions or parties. Secondly, we adopt late fusion methods to solve the problem of inco...
The assessment of Alzheimer's Disease (AD) and Mild Cognitive Impairment (MCI) associated wit... more The assessment of Alzheimer's Disease (AD) and Mild Cognitive Impairment (MCI) associated with brain changes remains a challenging task. Recent studies have demonstrated that combination of multi-modality imaging techniques can better reflect pathological characteristics and contribute to more accurate diagnosis of AD and MCI. In this paper, we propose a novel tensor-based multi-modality feature selection and regression method for diagnosis and biomarker identification of AD and MCI from normal controls. Specifically, we leverage the tensor structure to exploit high-level correlation information inherent in the multimodality data, and investigate tensor-level sparsity in the multilinear regression model. We present the practical advantages of our method for the analysis of ADNI data using three imaging modalities (VBM-MRI, FDG-PET and AV45-PET) with clinical parameters of disease severity and cognitive scores. The experimental results demonstrate the superior performance of our ...
This paper presents HBcompare, a method that classifies protein structures according to ligand bi... more This paper presents HBcompare, a method that classifies protein structures according to ligand binding preference categories by analyzing hydrogen bond topology. HBcompare excludes other characteristics of protein structure so that, in the event of accurate classification, it can implicate the involvement of hydrogen bonds in selective binding. This approach contrasts from methods that represent many aspects of protein structure because holistic representations cannot associate classification with just one characteristic. To our knowledge, HBcompare is the first technique with this capability. On five datasets of proteins that catalyze similar reactions with different preferred ligands, HBcompare correctly categorized proteins with similar ligand binding preferences 89.5% of the time. Using only hydrogen bond topology, classification accuracy with HBcompare surpassed standard structure-based comparison algorithms that use atomic coordinates. As a tool for implicating the role of hyd...
ACM Transactions on Intelligent Systems and Technology
Along with the rapid expansion of information technology and digitalization of health data, there... more Along with the rapid expansion of information technology and digitalization of health data, there is an increasing concern on maintaining data privacy while garnering the benefits in the medical field. Two critical challenges are identified: First, medical data is naturally distributed across multiple local sites, making it difficult to collectively train machine learning models without data leakage. Second, in medical applications, data are often collected from different sources and views, resulting in heterogeneity and complexity that requires reconciliation. In this article, we present a generic Federated Multi-view Learning (FedMV) framework for multi-view data leakage prevention. Specifically, we apply this framework to two types of problems based on local data availability: Vertical Federated Multi-view Learning (V-FedMV) and Horizontal Federated Multi-view Learning (H-FedMV). We experimented with real-world keyboard data collected from BiAffect study. Our results demonstrated...
With recent advances in data collection from multiple sources, multi-view data has received signi... more With recent advances in data collection from multiple sources, multi-view data has received significant attention. In multi-view data, each view represents a different perspective of data. Since label information is often expensive to acquire, multiview clustering has gained growing interest, which aims to obtain better clustering solution by exploiting complementary and consistent information across all views rather than only using an individual view. Due to inevitable sensor failures, data in each view may contain error. Error often exhibits as noise or feature-specific corruptions or outliers. Multi-view data may contain any or combination of these error types. Blindly clustering multi-view data i.e., without considering possible error in view(s) could significantly degrade the performance. The goal of error-robust multi-view clustering is to obtain useful outcome even if the multi-view data is corrupted. Existing error-robust multi-view clustering approaches with explicit error ...
Ensuring the privacy of sensitive data used to train modern machine learning models is of paramou... more Ensuring the privacy of sensitive data used to train modern machine learning models is of paramount importance in many areas of practice. One recent popular approach to study these concerns is using the differential privacy via a "teacher-student" model, wherein the teacher provides the student with useful, but noisy, information, hopefully allowing the student model to perform well on a given task. However, these studies only solve the privacy concerns of the teacher by assuming the student owns a public but unlabelled dataset. In real life, the student also has privacy concerns on its unlabelled data, so as to inquire about privacy protection on any data sent to the teacher. In this work, we re-design the privacy-preserving "teacher-student" model consisting of adopting both private arbitrary masking and local differential privacy, which protects the sensitive information of each student sample. However, the traditional training of teacher model is not robust o...
Depression is one of the most common mental illness problems, and the symptoms shown by patients ... more Depression is one of the most common mental illness problems, and the symptoms shown by patients are not consistent, making it difficult to diagnose in the process of clinical practice and pathological research. Although researchers hope that artificial intelligence can contribute to the diagnosis and treatment of depression, the traditional centralized machine learning needs to aggregate patient data, and the data privacy of patients with mental illness needs to be strictly confidential, which hinders machine learning algorithms clinical application. To solve the problem of privacy of the medical history of patients with depression, we implement federated learning to analyze and diagnose depression. First, we propose a general multi-view federated learning framework using multi-source data,which can extend any traditional machine learning model to support federated learning across different institutions or parties. Secondly, we adopt late fusion methods to solve the problem of inco...
Uploads
Papers by Lifang He