Search | arXiv e-print repository

Alljoined1 -- A dataset for EEG-to-Image decoding

Authors: Jonathan Xu, Bruno Aristimunha, Max Emanuel Feucht, Emma Qian, Charles Liu, Tazik Shahjahan, Martyna Spyra, Steven Zifan Zhang, Nicholas Short, Jioh Kim, Paula Perdomo, Ricky Renfeng Mao, Yashvir Sabharwal, Michael Ahedor Moaz Shoura, Adrian Nestor

Abstract: We present Alljoined1, a dataset built specifically for EEG-to-Image decoding. Recognizing that an extensive and unbiased sampling of neural responses to visual stimuli is crucial for image reconstruction efforts, we collected data from 8 participants looking at 10,000 natural images each. We have currently gathered 46,080 epochs of brain responses recorded with a 64-channel EEG headset. The datas… ▽ More We present Alljoined1, a dataset built specifically for EEG-to-Image decoding. Recognizing that an extensive and unbiased sampling of neural responses to visual stimuli is crucial for image reconstruction efforts, we collected data from 8 participants looking at 10,000 natural images each. We have currently gathered 46,080 epochs of brain responses recorded with a 64-channel EEG headset. The dataset combines response-based stimulus timing, repetition between blocks and sessions, and diverse image classes with the goal of improving signal quality. For transparency, we also provide data quality scores. We publicly release the dataset and all code at https://linktr.ee/alljoined1. △ Less

Submitted 14 May, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

Comments: 8 Pages, 6 Figures

ACM Class: I.5.1; I.6.3; I.2.6; K.3.2

arXiv:2312.17495 [pdf]

Integrating Chemical Language and Molecular Graph in Multimodal Fused Deep Learning for Drug Property Prediction

Authors: Xiaohua Lu, Liangxu Xie, Lei Xu, Rongzhi Mao, Shan Chang, Xiaojun Xu

Abstract: Accurately predicting molecular properties is a challenging but essential task in drug discovery. Recently, many mono-modal deep learning methods have been successfully applied to molecular property prediction. However, the inherent limitation of mono-modal learning arises from relying solely on one modality of molecular representation, which restricts a comprehensive understanding of drug molecul… ▽ More Accurately predicting molecular properties is a challenging but essential task in drug discovery. Recently, many mono-modal deep learning methods have been successfully applied to molecular property prediction. However, the inherent limitation of mono-modal learning arises from relying solely on one modality of molecular representation, which restricts a comprehensive understanding of drug molecules and hampers their resilience against data noise. To overcome the limitations, we construct multimodal deep learning models to cover different molecular representations. We convert drug molecules into three molecular representations, SMILES-encoded vectors, ECFP fingerprints, and molecular graphs. To process the modal information, Transformer-Encoder, bi-directional gated recurrent units (BiGRU), and graph convolutional network (GCN) are utilized for feature learning respectively, which can enhance the model capability to acquire complementary and naturally occurring bioinformatics information. We evaluated our triple-modal model on six molecule datasets. Different from bi-modal learning models, we adopt five fusion methods to capture the specific features and leverage the contribution of each modal information better. Compared with mono-modal models, our multimodal fused deep learning (MMFDL) models outperform single models in accuracy, reliability, and resistance capability against noise. Moreover, we demonstrate its generalization ability in the prediction of binding constants for protein-ligand complex molecules in the refined set of PDBbind. The advantage of the multimodal model lies in its ability to process diverse sources of data using proper models and suitable fusion methods, which would enhance the noise resistance of the model while obtaining data diversity. △ Less

Submitted 29 December, 2023; originally announced December 2023.

arXiv:2309.14239 [pdf]

Simulation-Based Design of Bicuspidization of the Aortic Valve

Authors: Alexander D. Kaiser, Moussa A. Haidar, Perry S. Choi, Amit Sharir, Alison L. Marsden, Michael R. Ma

Abstract: Objective: Severe congenital aortic valve pathology in the growing patient remains a challenging clinical scenario. Bicuspidization of the diseased aortic valve has proven to be a promising repair technique with acceptable durability. However, most understanding of the procedure is empirical and retrospective. This work seeks to design the optimal gross morphology associated with surgical bicuspid… ▽ More Objective: Severe congenital aortic valve pathology in the growing patient remains a challenging clinical scenario. Bicuspidization of the diseased aortic valve has proven to be a promising repair technique with acceptable durability. However, most understanding of the procedure is empirical and retrospective. This work seeks to design the optimal gross morphology associated with surgical bicuspidization with simulations, based on the hypothesis that modifications to the free edge length cause or relieve stenosis. Methods: Model bicuspid valves were constructed with varying free edge lengths and gross morphology. Fluid-structure interaction simulations were conducted in a single patient-specific model geometry. The models were evaluated for primary targets of stenosis and regurgitation. Secondary targets were assessed and included qualitative hemodynamics, geometric height, effective height, orifice area and prolapse. Results: Stenosis decreased with increasing free edge length and was pronounced with free edge length less than or equal to 1.3 times the annular diameter d. With free edge length 1.5d or greater, no stenosis occurred. All models were free of regurgitation. Substantial prolapse occurred with free edge length greater than or equal to 1.7d. Conclusions: Free edge length greater than or equal to 1.5d was required to avoid aortic stenosis in simulations. Cases with free edge length greater than or equal to 1.7d showed excessive prolapse and other changes in gross morphology. Cases with free edge length 1.5-1.6d have a total free edge length approximately equal to the annular circumference and appeared optimal. These effects should be studied in vitro and in animal studies. △ Less

Submitted 25 September, 2023; originally announced September 2023.

MSC Class: 92C35 (Primary); 92C50; 92C32; 76Z05 (Secondary) ACM Class: J.3.1

arXiv:2308.01839 [pdf, other]

doi 10.1073/pnas.2313719121

Is your data alignable? Principled and interpretable alignability testing and integration of single-cell data

Authors: Rong Ma, Eric D. Sun, David Donoho, James Zou

Abstract: Single-cell data integration can provide a comprehensive molecular view of cells, and many algorithms have been developed to remove unwanted technical or biological variations and integrate heterogeneous single-cell datasets. Despite their wide usage, existing methods suffer from several fundamental limitations. In particular, we lack a rigorous statistical test for whether two high-dimensional si… ▽ More Single-cell data integration can provide a comprehensive molecular view of cells, and many algorithms have been developed to remove unwanted technical or biological variations and integrate heterogeneous single-cell datasets. Despite their wide usage, existing methods suffer from several fundamental limitations. In particular, we lack a rigorous statistical test for whether two high-dimensional single-cell datasets are alignable (and therefore should even be aligned). Moreover, popular methods can substantially distort the data during alignment, making the aligned data and downstream analysis difficult to interpret. To overcome these limitations, we present a spectral manifold alignment and inference (SMAI) framework, which enables principled and interpretable alignability testing and structure-preserving integration of single-cell data with the same type of features. SMAI provides a statistical test to robustly assess the alignability between datasets to avoid misleading inference, and is justified by high-dimensional statistical theory. On a diverse range of real and simulated benchmark datasets, it outperforms commonly used alignment methods. Moreover, we show that SMAI improves various downstream analyses such as identification of differentially expressed genes and imputation of single-cell spatial transcriptomics, providing further biological insights. SMAI's interpretability also enables quantification and a deeper understanding of the sources of technical confounders in single-cell data. △ Less

Submitted 29 February, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

Journal ref: Proceedings of the National Academy of Sciences, 2024, 121(10) e2313719121

arXiv:2210.13711 [pdf, other]

A Spectral Method for Assessing and Combining Multiple Data Visualizations

Authors: Rong Ma, Eric D. Sun, James Zou

Abstract: Dimension reduction and data visualization aim to project a high-dimensional dataset to a low-dimensional space while capturing the intrinsic structures in the data. It is an indispensable part of modern data science, and many dimensional reduction and visualization algorithms have been developed. However, different algorithms have their own strengths and weaknesses, making it critically important… ▽ More Dimension reduction and data visualization aim to project a high-dimensional dataset to a low-dimensional space while capturing the intrinsic structures in the data. It is an indispensable part of modern data science, and many dimensional reduction and visualization algorithms have been developed. However, different algorithms have their own strengths and weaknesses, making it critically important to evaluate their relative performance for a given dataset, and to leverage and combine their individual strengths. In this paper, we propose an efficient spectral method for assessing and combining multiple visualizations of a given dataset produced by diverse algorithms. The proposed method provides a quantitative measure -- the visualization eigenscore -- of the relative performance of the visualizations for preserving the structure around each data point. Then it leverages the eigenscores to obtain a consensus visualization, which has much improved { quality over the individual visualizations in capturing the underlying true data structure.} Our approach is flexible and works as a wrapper around any visualizations. We analyze multiple simulated and real-world datasets from diverse applications to demonstrate the effectiveness of the eigenscores for evaluating visualizations and the superiority of the proposed consensus visualization. Furthermore, we establish rigorous theoretical justification of our method based on a general statistical framework, yielding fundamental principles behind the empirical success of consensus visualization along with practical guidance. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: Under revision of Nature Communications

arXiv:2206.14566 [pdf]

doi 10.1016/j.crmeth.2023.100540

Using Interpretable Machine Learning to Massively Increase the Number of Antibody-Virus Interactions Across Studies

Authors: Tal Einav, Rong Ma

Abstract: A central challenge in every field of biology is to use existing measurements to predict the outcomes of future experiments. In this work, we consider the wealth of antibody inhibition data against variants of the influenza virus. Due to this viru's genetic diversity and evolvability, the variants examined in one study will often have little-to-no overlap with other studies, making it difficult to… ▽ More A central challenge in every field of biology is to use existing measurements to predict the outcomes of future experiments. In this work, we consider the wealth of antibody inhibition data against variants of the influenza virus. Due to this viru's genetic diversity and evolvability, the variants examined in one study will often have little-to-no overlap with other studies, making it difficult to discern common patterns or unify datasets for further analysis. To that end, we develop a computational framework that predicts how an antibody or serum would inhibit any variant from any other study. We use this framework to greatly expand seven influenza datasets utilizing hemagglutination inhibition, validating our method upon 200,000 existing measurements and predicting 2,000,000 new values along with their uncertainties. With these new values, we quantify the transferability between seven vaccination and infection studies in humans and ferrets, show that the serum potency is negatively correlated with breadth, and present a tool for pandemic preparedness. This data-driven approach does not require any information beyond each virus's name and measurements, and even datasets with as few as 5 viruses can be expanded, making this approach widely applicable. Future influenza studies using hemagglutination inhibition can directly utilize our curated datasets to predict newly measured antibody responses against ~80 H3N2 influenza viruses from 1968-2011, whereas immunological studies utilizing other viruses or a different assay only need a single partially-overlapping dataset to extend their work. In essence, this approach enables a shift in perspective when analyzing data from "what you see is what you get" into "what anyone sees is what everyone gets." △ Less

Submitted 30 October, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

Journal ref: Cell Reports Methods, 2023

arXiv:2206.05052 [pdf]

Meta-data Study in Autism Spectrum Disorder Classification Based on Structural MRI

Authors: Ruimin Ma, Yanlin Wang, Yanjie Wei, Yi Pan

Abstract: Accurate diagnosis of autism spectrum disorder (ASD) based on neuroimaging data has significant implications, as extracting useful information from neuroimaging data for ASD detection is challenging. Even though machine learning techniques have been leveraged to improve the information extraction from neuroimaging data, the varying data quality caused by different meta-data conditions (i.e., data… ▽ More Accurate diagnosis of autism spectrum disorder (ASD) based on neuroimaging data has significant implications, as extracting useful information from neuroimaging data for ASD detection is challenging. Even though machine learning techniques have been leveraged to improve the information extraction from neuroimaging data, the varying data quality caused by different meta-data conditions (i.e., data collection strategies) limits the effective information that can be extracted, thus leading to data-dependent predictive accuracies in ASD detection, which can be worse than random guess in some cases. In this work, we systematically investigate the impact of three kinds of meta-data on the predictive accuracy of classifying ASD based on structural MRI collected from 20 different sites, where meta-data conditions vary. △ Less

Submitted 8 June, 2022; originally announced June 2022.

arXiv:2111.12566 [pdf, other]

Acoustical Analysis of Speech Under Physical Stress in Relation to Physical Activities and Physical Literacy

Authors: Si-Ioi Ng, Rui-Si Ma, Tan Lee, Raymond Kim-Wai Sum

Abstract: Human speech production encompasses physiological processes that naturally react to physic stress. Stress caused by physical activity (PA), e.g., running, may lead to significant changes in a person's speech. The major changes are related to the aspects of pitch level, speaking rate, pause pattern, and breathiness. The extent of change depends presumably on physical fitness and well-being of the p… ▽ More Human speech production encompasses physiological processes that naturally react to physic stress. Stress caused by physical activity (PA), e.g., running, may lead to significant changes in a person's speech. The major changes are related to the aspects of pitch level, speaking rate, pause pattern, and breathiness. The extent of change depends presumably on physical fitness and well-being of the person, as well as intensity of PA. The general wellness of a person is further related to his/her physical literacy (PL), which refers to a holistic description of engagement in PA. This paper presents the development of a Cantonese speech database that contains audio recordings of speech before and after physical exercises of different intensity levels. The corpus design and data collection process are described. Preliminary results of acoustical analysis are presented to illustrate the impact of PA on pitch level, pitch range, speaking and articulation rate, and time duration of pauses. It is also noted that the effect of PA is correlated to some of the PA and PL measures. △ Less

Submitted 25 April, 2022; v1 submitted 20 November, 2021; originally announced November 2021.

Comments: Accepted to Speech Prosody 2022

arXiv:2002.09283 [pdf]

doi 10.1038/s41597-022-01211-x

MODMA dataset: a Multi-modal Open Dataset for Mental-disorder Analysis

Authors: Hanshu Cai, Yiwen Gao, Shuting Sun, Na Li, Fuze Tian, Han Xiao, Jianxiu Li, Zhengwu Yang, Xiaowei Li, Qinglin Zhao, Zhenyu Liu, Zhijun Yao, Minqiang Yang, Hong Peng, Jing Zhu, Xiaowei Zhang, Guoping Gao, Fang Zheng, Rui Li, Zhihua Guo, Rong Ma, Jing Yang, Lan Zhang, Xiping Hu, Yumin Li , et al. (1 additional authors not shown)

Abstract: According to the World Health Organization, the number of mental disorder patients, especially depression patients, has grown rapidly and become a leading contributor to the global burden of disease. However, the present common practice of depression diagnosis is based on interviews and clinical scales carried out by doctors, which is not only labor-consuming but also time-consuming. One important… ▽ More According to the World Health Organization, the number of mental disorder patients, especially depression patients, has grown rapidly and become a leading contributor to the global burden of disease. However, the present common practice of depression diagnosis is based on interviews and clinical scales carried out by doctors, which is not only labor-consuming but also time-consuming. One important reason is due to the lack of physiological indicators for mental disorders. With the rising of tools such as data mining and artificial intelligence, using physiological data to explore new possible physiological indicators of mental disorder and creating new applications for mental disorder diagnosis has become a new research hot topic. However, good quality physiological data for mental disorder patients are hard to acquire. We present a multi-modal open dataset for mental-disorder analysis. The dataset includes EEG and audio data from clinically depressed patients and matching normal controls. All our patients were carefully diagnosed and selected by professional psychiatrists in hospitals. The EEG dataset includes not only data collected using traditional 128-electrodes mounted elastic cap, but also a novel wearable 3-electrode EEG collector for pervasive applications. The 128-electrodes EEG signals of 53 subjects were recorded as both in resting state and under stimulation; the 3-electrode EEG signals of 55 subjects were recorded in resting state; the audio data of 52 subjects were recorded during interviewing, reading, and picture description. We encourage other researchers in the field to use it for testing their methods of mental-disorder analysis. △ Less

Submitted 4 March, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

Journal ref: Sci Data 9, 178 (2022)

arXiv:1904.05198 [pdf]

Detection of Metallo-\b{eta}-Lactamases-Encoding Genes Among Clinical Isolates of Escherichia Coli in a Tertiary Care Hospital, Malaysia

Authors: Fazlul MKK, Deepthi S, Farzana Y, Najnin A, Rashid Ma, Munira B, Srikumar C, Nazmul

Abstract: The multidrug resistant Escherichia coli strains causes multiple clinical infections and has become a rising problem globally. The metallo-\b{eta}-lactamases encoding genes are very severe in gram-negative bacteria especially E. coli. This study was aimed to evaluate the prevalence of MBLs among the clinical isolates of E. coli. A total of 65 E. coli isolates were collected from various clinical s… ▽ More The multidrug resistant Escherichia coli strains causes multiple clinical infections and has become a rising problem globally. The metallo-\b{eta}-lactamases encoding genes are very severe in gram-negative bacteria especially E. coli. This study was aimed to evaluate the prevalence of MBLs among the clinical isolates of E. coli. A total of 65 E. coli isolates were collected from various clinical samples of Malaysian patients with bacterial infections. The conventional microbiological test was performed for isolation and identification of E. coli producing MBLs strains in this vicinity. Multidrug Resistance (MDR) of E. coli isolates were assessed using disk diffusion test. Phenotypic methods, as well as genotypic- PCR methods, were performed to detect the presence of metallo-\b{eta}-lactamase resistance genes (blaIMP, blaVIM) in imipenem resistant strains. Out of 65 E. coli isolates, 42 isolates (57.3%) were MDR. The isolates from urine (19) produced significantly more MDR (10) isolates than other sources. Additionally, 19 (29.2%) imipenem resistant E. coli isolates contained 10 MBLs gene, 7(36.8%) isolates contained blaIMP and 3(15.7%) isolates contained blaVIM genes. This study revealed the significant occurrence of MBL producing E. coli isolates in clinical specimens and its association with health risk factors indicating an alarming situation in Malaysia. It demands an appropriate concern to avoid failure of treatments and infection control management. △ Less

Submitted 10 April, 2019; originally announced April 2019.

arXiv:1902.02014 [pdf]

doi 10.31838/ijpr/2019.11.01.031

Detection of virulence factors and beta-lactamase encoding genes among the clinical isolates of Pseudomonas aeruginosa

Authors: Fazlul MKK, Najnin A, Farzana Y, Rashid MA, Deepthi S, Srikumar C, SS Rashid, Nazmul MHM

Abstract: Background: Pseudomonas aeruginosa has emerged as a significant opportunistic bacterial pathogen that causes nosocomial infections in healthcare settings resulting in treatment failure throughout the world. This study was carried out to compare the relatedness between virulence characteristics and \b{eta}-lactamase encoding genes producing Pseudomonas aeruginosa. Methods: A total of 120 P. aerugin… ▽ More Background: Pseudomonas aeruginosa has emerged as a significant opportunistic bacterial pathogen that causes nosocomial infections in healthcare settings resulting in treatment failure throughout the world. This study was carried out to compare the relatedness between virulence characteristics and \b{eta}-lactamase encoding genes producing Pseudomonas aeruginosa. Methods: A total of 120 P. aeruginosa isolates were obtained from both paediatric and adult patients of Selayang Hospital, Kuala Lumpur, Malaysia. Phenotypic methods were used to detect various virulence factors (Phospholipase, Hemolysin, Gelatinase, DNAse, and Biofilm). All the isolates were evaluated for production of extended spectrum beta-lactamase (ESBL) as well as metallo \b{eta}-lactamase (MBL) by Double-disk synergy test (DDST) and E-test while AmpC \b{eta}-lactamase production was detected by disk antagonism test. Results: In this study, 120 Pseudomonas aeruginosa isolates (20 each from blood, wounds, respiratory secretions, stools, urine, and sputum samples) were studied. Among Pseudomonas aeruginosa isolates, the distribution of virulence factors was positive for hemolysin (48.33%), DNAse (43.33%), phospholipase (40.83%), gelatinase (31.66%) production and biofilm formation (34%) respectively. The prevalence of multiple \b{eta}-lactamase in P. aeruginosa showed 19.16% ESBL, 7.5% MBL and 10.83% AmpC production respectively. Conclusion: A regular surveillance is required to reduce public healt △ Less

Submitted 5 February, 2019; originally announced February 2019.

Comments: International Journal of Pharmaceutical Research, 2019

Journal ref: International Journal of Pharmaceutical Research, 2019

arXiv:1901.02936 [pdf, other]

The Mahalanobis kernel for heritability estimation in genome-wide association studies: fixed-effects and random-effects methods

Authors: Ruijun Ma, Lee H. Dicker

Abstract: Linear mixed models (LMMs) are widely used for heritability estimation in genome-wide association studies (GWAS). In standard approaches to heritability estimation with LMMs, a genetic relationship matrix (GRM) must be specified. In GWAS, the GRM is frequently a correlation matrix estimated from the study population's genotypes, which corresponds to a normalized Euclidean distance kernel. In this… ▽ More Linear mixed models (LMMs) are widely used for heritability estimation in genome-wide association studies (GWAS). In standard approaches to heritability estimation with LMMs, a genetic relationship matrix (GRM) must be specified. In GWAS, the GRM is frequently a correlation matrix estimated from the study population's genotypes, which corresponds to a normalized Euclidean distance kernel. In this paper, we show that reliance on the Euclidean distance kernel contributes to several unresolved modeling inconsistencies in heritability estimation for GWAS. These inconsistencies can cause biased heritability estimates in the presence of linkage disequilibrium (LD), depending on the distribution of causal variants. We show that these biases can be resolved (at least at the modeling level) if one adopts a Mahalanobis distance-based GRM for LMM analysis. Additionally, we propose a new definition of partitioned heritability -- the heritability attributable to a subset of genes or single nucleotide polymorphisms (SNPs) -- using the Mahalanobis GRM, and show that it inherits many of the nice consistency properties identified in our original analysis. Partitioned heritability is a relatively new area for GWAS analysis, where inconsistency issues related to LD have previously been known to be especially pernicious. △ Less

Submitted 9 January, 2019; originally announced January 2019.

Comments: 21 pages, 4 figures

arXiv:1401.7036 [pdf, other]

doi 10.1103/PhysRevLett.113.048101

Active phase and amplitude fluctuations of flagellar beating

Authors: Rui Ma, Gary S. Klindt, Ingmar H. Riedel-Kruse, Frank Jülicher, Benjamin M. Friedrich

Abstract: The eukaryotic flagellum beats periodically, driven by the oscillatory dynamics of molecular motors, to propel cells and pump fluids. Small, but perceivable fluctuations in the beat of individual flagella have physiological implications for synchronization in collections of flagella as well as for hydrodynamic interactions between flagellated swimmers. Here, we characterize phase and amplitude flu… ▽ More The eukaryotic flagellum beats periodically, driven by the oscillatory dynamics of molecular motors, to propel cells and pump fluids. Small, but perceivable fluctuations in the beat of individual flagella have physiological implications for synchronization in collections of flagella as well as for hydrodynamic interactions between flagellated swimmers. Here, we characterize phase and amplitude fluctuations of flagellar bending waves using shape mode analysis and limit cycle reconstruction. We report a quality factor of flagellar oscillations, $Q=38.0\pm 16.7$ (mean$\pm$s.e.). Our analysis shows that flagellar fluctuations are dominantly of active origin. Using a minimal model of collective motor oscillations, we demonstrate how the stochastic dynamics of individual motors can give rise to active small-number fluctuations in motor-cytoskeleton systems. △ Less

Submitted 2 July, 2014; v1 submitted 27 January, 2014; originally announced January 2014.

Comments: accepted for publication in Physical Review Letters

Showing 1–13 of 13 results for author: Ma, R