Skip to main content

Lifang He

Followers

4

Following

3

Public Views

I am a postdoc at Weill Cornell Medicine, Cornell University. Prior to joining this, I worked as a postdoc in Computer Science at the University of Illinois at Chicago. I have a broad research interest in machine learning, data mining, and biomedical informatics, particularly with tensor analysis.
Supervisors: Postdoctoral Associate

less

Uploads

Papers by Lifang He

Federated Depression Detection from Multi-SourceMobile Health Data

arXiv (Cornell University), Feb 6, 2021

Lime: Low-Cost and Incremental Learning for Dynamic Heterogeneous Information Networks

IEEE Transactions on Computers, Mar 1, 2022

SUGAR: Subgraph Neural Network with Reinforcement Pooling and Self-Supervised Mutual Information Mechanism

arXiv (Cornell University), Jan 20, 2021

Deep learning for drug repurposing: Methods, databases, and applications

Wiley Interdisciplinary Reviews: Computational Molecular Science, Feb 8, 2022

A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT

arXiv (Cornell University), Feb 18, 2023

DC-FUDA: Improving deep clustering via fully unsupervised domain adaptation

Neurocomputing

Deep Clustering: A Comprehensive Survey

arXiv (Cornell University), Oct 8, 2022

Tensor-based Multi-Modality Feature Selection and Regression for Alzheimer’s Disease Diagnosis

Artificial Intelligence and Applications

The assessment of Alzheimer's Disease (AD) and Mild Cognitive Impairment (MCI) associated wit... more The assessment of Alzheimer's Disease (AD) and Mild Cognitive Impairment (MCI) associated with brain changes remains a challenging task. Recent studies have demonstrated that combination of multi-modality imaging techniques can better reflect pathological characteristics and contribute to more accurate diagnosis of AD and MCI. In this paper, we propose a novel tensor-based multi-modality feature selection and regression method for diagnosis and biomarker identification of AD and MCI from normal controls. Specifically, we leverage the tensor structure to exploit high-level correlation information inherent in the multimodality data, and investigate tensor-level sparsity in the multilinear regression model. We present the practical advantages of our method for the analysis of ADNI data using three imaging modalities (VBM-MRI, FDG-PET and AV45-PET) with clinical parameters of disease severity and cognitive scores. The experimental results demonstrate the superior performance of our ...

HBcompare: Classifying Ligand Binding Preferences with Hydrogen Bond Topology

Biomolecules

This paper presents HBcompare, a method that classifies protein structures according to ligand bi... more This paper presents HBcompare, a method that classifies protein structures according to ligand binding preference categories by analyzing hydrogen bond topology. HBcompare excludes other characteristics of protein structure so that, in the event of accurate classification, it can implicate the involvement of hydrogen bonds in selective binding. This approach contrasts from methods that represent many aspects of protein structure because holistic representations cannot associate classification with just one characteristic. To our knowledge, HBcompare is the first technique with this capability. On five datasets of proteins that catalyze similar reactions with different preferred ligands, HBcompare correctly categorized proteins with similar ligand binding preferences 89.5% of the time. Using only hydrogen bond topology, classification accuracy with HBcompare surpassed standard structure-based comparison algorithms that use atomic coordinates. As a tool for implicating the role of hyd...

Federated Multi-view Learning for Private Medical Data Integration and Analysis

ACM Transactions on Intelligent Systems and Technology

Along with the rapid expansion of information technology and digitalization of health data, there... more Along with the rapid expansion of information technology and digitalization of health data, there is an increasing concern on maintaining data privacy while garnering the benefits in the medical field. Two critical challenges are identified: First, medical data is naturally distributed across multiple local sites, making it difficult to collectively train machine learning models without data leakage. Second, in medical applications, data are often collected from different sources and views, resulting in heterogeneity and complexity that requires reconciliation. In this article, we present a generic Federated Multi-view Learning (FedMV) framework for multi-view data leakage prevention. Specifically, we apply this framework to two types of problems based on local data availability: Vertical Federated Multi-view Learning (V-FedMV) and Horizontal Federated Multi-view Learning (H-FedMV). We experimented with real-world keyboard data collected from BiAffect study. Our results demonstrated...

From Known to Unknown

Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Joint Embedding of Structural and Functional Brain Networks with Graph Neural Networks for Mental Illness Diagnosis

2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

Deep reinforcement learning guided graph neural networks for brain network analysis

Neural Networks

A Robust and Generalized Framework for Adversarial Graph Embedding

arXiv (Cornell University), May 22, 2021

Error-robust multi-view clustering

2017 IEEE International Conference on Big Data (Big Data), 2017

Cross-knowledge-graph entity alignment via relation prediction

Knowledge-Based Systems, 2022

Error-Robust Multi-View Clustering: Progress, Challenges and Opportunities

ArXiv, 2021

With recent advances in data collection from multiple sources, multi-view data has received signi... more With recent advances in data collection from multiple sources, multi-view data has received significant attention. In multi-view data, each view represents a different perspective of data. Since label information is often expensive to acquire, multiview clustering has gained growing interest, which aims to obtain better clustering solution by exploiting complementary and consistent information across all views rather than only using an individual view. Due to inevitable sensor failures, data in each view may contain error. Error often exhibits as noise or feature-specific corruptions or outliers. Multi-view data may contain any or combination of these error types. Blindly clustering multi-view data i.e., without considering possible error in view(s) could significantly degrade the performance. The goal of error-robust multi-view clustering is to obtain useful outcome even if the multi-view data is corrupted. Existing error-robust multi-view clustering approaches with explicit error ...

Not Just Cloud Privacy: Protecting Client Privacy in Teacher-Student Learning

ArXiv, 2019

Ensuring the privacy of sensitive data used to train modern machine learning models is of paramou... more Ensuring the privacy of sensitive data used to train modern machine learning models is of paramount importance in many areas of practice. One recent popular approach to study these concerns is using the differential privacy via a "teacher-student" model, wherein the teacher provides the student with useful, but noisy, information, hopefully allowing the student model to perform well on a given task. However, these studies only solve the privacy concerns of the teacher by assuming the student owns a public but unlabelled dataset. In real life, the student also has privacy concerns on its unlabelled data, so as to inquire about privacy protection on any data sent to the teacher. In this work, we re-design the privacy-preserving "teacher-student" model consisting of adopting both private arbitrary masking and local differential privacy, which protects the sensitive information of each student sample. However, the traditional training of teacher model is not robust o...

DeepVASP-E: A Flexible Analysis of Electrostatic Isopotentials for Finding and Explaining Mechanisms that Control Binding Specificity

Biocomputing 2022, 2021

Federated Depression Detection from Multi-SourceMobile Health Data

ArXiv, 2021

Depression is one of the most common mental illness problems, and the symptoms shown by patients ... more Depression is one of the most common mental illness problems, and the symptoms shown by patients are not consistent, making it difficult to diagnose in the process of clinical practice and pathological research. Although researchers hope that artificial intelligence can contribute to the diagnosis and treatment of depression, the traditional centralized machine learning needs to aggregate patient data, and the data privacy of patients with mental illness needs to be strictly confidential, which hinders machine learning algorithms clinical application. To solve the problem of privacy of the medical history of patients with depression, we implement federated learning to analyze and diagnose depression. First, we propose a general multi-view federated learning framework using multi-source data,which can extend any traditional machine learning model to support federated learning across different institutions or parties. Secondly, we adopt late fusion methods to solve the problem of inco...

Federated Depression Detection from Multi-SourceMobile Health Data

arXiv (Cornell University), Feb 6, 2021

Lime: Low-Cost and Incremental Learning for Dynamic Heterogeneous Information Networks

IEEE Transactions on Computers, Mar 1, 2022

SUGAR: Subgraph Neural Network with Reinforcement Pooling and Self-Supervised Mutual Information Mechanism

arXiv (Cornell University), Jan 20, 2021

Deep learning for drug repurposing: Methods, databases, and applications

Wiley Interdisciplinary Reviews: Computational Molecular Science, Feb 8, 2022

A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT

arXiv (Cornell University), Feb 18, 2023

DC-FUDA: Improving deep clustering via fully unsupervised domain adaptation

Neurocomputing

Deep Clustering: A Comprehensive Survey

arXiv (Cornell University), Oct 8, 2022

Tensor-based Multi-Modality Feature Selection and Regression for Alzheimer’s Disease Diagnosis

Artificial Intelligence and Applications

The assessment of Alzheimer's Disease (AD) and Mild Cognitive Impairment (MCI) associated wit... more The assessment of Alzheimer's Disease (AD) and Mild Cognitive Impairment (MCI) associated with brain changes remains a challenging task. Recent studies have demonstrated that combination of multi-modality imaging techniques can better reflect pathological characteristics and contribute to more accurate diagnosis of AD and MCI. In this paper, we propose a novel tensor-based multi-modality feature selection and regression method for diagnosis and biomarker identification of AD and MCI from normal controls. Specifically, we leverage the tensor structure to exploit high-level correlation information inherent in the multimodality data, and investigate tensor-level sparsity in the multilinear regression model. We present the practical advantages of our method for the analysis of ADNI data using three imaging modalities (VBM-MRI, FDG-PET and AV45-PET) with clinical parameters of disease severity and cognitive scores. The experimental results demonstrate the superior performance of our ...

HBcompare: Classifying Ligand Binding Preferences with Hydrogen Bond Topology

Biomolecules

This paper presents HBcompare, a method that classifies protein structures according to ligand bi... more This paper presents HBcompare, a method that classifies protein structures according to ligand binding preference categories by analyzing hydrogen bond topology. HBcompare excludes other characteristics of protein structure so that, in the event of accurate classification, it can implicate the involvement of hydrogen bonds in selective binding. This approach contrasts from methods that represent many aspects of protein structure because holistic representations cannot associate classification with just one characteristic. To our knowledge, HBcompare is the first technique with this capability. On five datasets of proteins that catalyze similar reactions with different preferred ligands, HBcompare correctly categorized proteins with similar ligand binding preferences 89.5% of the time. Using only hydrogen bond topology, classification accuracy with HBcompare surpassed standard structure-based comparison algorithms that use atomic coordinates. As a tool for implicating the role of hyd...

Federated Multi-view Learning for Private Medical Data Integration and Analysis

ACM Transactions on Intelligent Systems and Technology

Along with the rapid expansion of information technology and digitalization of health data, there... more Along with the rapid expansion of information technology and digitalization of health data, there is an increasing concern on maintaining data privacy while garnering the benefits in the medical field. Two critical challenges are identified: First, medical data is naturally distributed across multiple local sites, making it difficult to collectively train machine learning models without data leakage. Second, in medical applications, data are often collected from different sources and views, resulting in heterogeneity and complexity that requires reconciliation. In this article, we present a generic Federated Multi-view Learning (FedMV) framework for multi-view data leakage prevention. Specifically, we apply this framework to two types of problems based on local data availability: Vertical Federated Multi-view Learning (V-FedMV) and Horizontal Federated Multi-view Learning (H-FedMV). We experimented with real-world keyboard data collected from BiAffect study. Our results demonstrated...

From Known to Unknown

Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Joint Embedding of Structural and Functional Brain Networks with Graph Neural Networks for Mental Illness Diagnosis

2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

Deep reinforcement learning guided graph neural networks for brain network analysis

Neural Networks

A Robust and Generalized Framework for Adversarial Graph Embedding

arXiv (Cornell University), May 22, 2021

Error-robust multi-view clustering

2017 IEEE International Conference on Big Data (Big Data), 2017

Cross-knowledge-graph entity alignment via relation prediction

Knowledge-Based Systems, 2022

Error-Robust Multi-View Clustering: Progress, Challenges and Opportunities

ArXiv, 2021

With recent advances in data collection from multiple sources, multi-view data has received signi... more With recent advances in data collection from multiple sources, multi-view data has received significant attention. In multi-view data, each view represents a different perspective of data. Since label information is often expensive to acquire, multiview clustering has gained growing interest, which aims to obtain better clustering solution by exploiting complementary and consistent information across all views rather than only using an individual view. Due to inevitable sensor failures, data in each view may contain error. Error often exhibits as noise or feature-specific corruptions or outliers. Multi-view data may contain any or combination of these error types. Blindly clustering multi-view data i.e., without considering possible error in view(s) could significantly degrade the performance. The goal of error-robust multi-view clustering is to obtain useful outcome even if the multi-view data is corrupted. Existing error-robust multi-view clustering approaches with explicit error ...

Not Just Cloud Privacy: Protecting Client Privacy in Teacher-Student Learning

ArXiv, 2019

Ensuring the privacy of sensitive data used to train modern machine learning models is of paramou... more Ensuring the privacy of sensitive data used to train modern machine learning models is of paramount importance in many areas of practice. One recent popular approach to study these concerns is using the differential privacy via a "teacher-student" model, wherein the teacher provides the student with useful, but noisy, information, hopefully allowing the student model to perform well on a given task. However, these studies only solve the privacy concerns of the teacher by assuming the student owns a public but unlabelled dataset. In real life, the student also has privacy concerns on its unlabelled data, so as to inquire about privacy protection on any data sent to the teacher. In this work, we re-design the privacy-preserving "teacher-student" model consisting of adopting both private arbitrary masking and local differential privacy, which protects the sensitive information of each student sample. However, the traditional training of teacher model is not robust o...

DeepVASP-E: A Flexible Analysis of Electrostatic Isopotentials for Finding and Explaining Mechanisms that Control Binding Specificity

Biocomputing 2022, 2021

Federated Depression Detection from Multi-SourceMobile Health Data

ArXiv, 2021

Depression is one of the most common mental illness problems, and the symptoms shown by patients ... more Depression is one of the most common mental illness problems, and the symptoms shown by patients are not consistent, making it difficult to diagnose in the process of clinical practice and pathological research. Although researchers hope that artificial intelligence can contribute to the diagnosis and treatment of depression, the traditional centralized machine learning needs to aggregate patient data, and the data privacy of patients with mental illness needs to be strictly confidential, which hinders machine learning algorithms clinical application. To solve the problem of privacy of the medical history of patients with depression, we implement federated learning to analyze and diagnose depression. First, we propose a general multi-view federated learning framework using multi-source data,which can extend any traditional machine learning model to support federated learning across different institutions or parties. Secondly, we adopt late fusion methods to solve the problem of inco...