-
Understanding data analysis aspects of TMS-EEG in clinical study: a mini review and a case study with open dataset
Authors:
Hua Cheng
Abstract:
Concurrency of transcranial magnetic stimulation with electroencephalography (TMS-EEG) technique is a powerful and challenging methodology for basic research and clinical applications. Aspects considered in experiments for effective TMS-EEG recordings and analysis, including artifact management, data analysis and interpretation and protocols. mini review offers an extensive insight of TMS-EEG meth…
▽ More
Concurrency of transcranial magnetic stimulation with electroencephalography (TMS-EEG) technique is a powerful and challenging methodology for basic research and clinical applications. Aspects considered in experiments for effective TMS-EEG recordings and analysis, including artifact management, data analysis and interpretation and protocols. mini review offers an extensive insight of TMS-EEG methodology in experimental and computational procedures. Case study aims to leverage an openly available, high-quality EEG dataset to delve into the alterations in cortical activity. By applying Intermittent theta-burst stimulation (iTBS) and continuous theta-burst stimulation (cTBS) to the left dorsolateral prefrontal cortex (DLPFC) in healthy individuals, we observe changes in oscillatory patterns within the EEG data. The dataset includes meticulously extracted resting-state EEG recordings, TMS-evoked potential data, and MRI scans. To process these data, we utilized Brainstorm, an open-source Matlab application, which facilitated noise reduction through independent component analysis and signal-space projection techniques. It allowed us to identify, visualize, and analyze TMS-evoked potentials (TEPs) and TMS-induced oscillations (TIOs). In addition, the study presents detailed plots of resting-state EEG power, local mean field power (LMFP), TMS-related spectral perturbation (TSRP), and inter-trial phase clustering (ITPC). Paired t-tests and cluster-based permutation tests have been performed for statistical analysis. The wealth and quality of this dataset make it ideal for examining the neuromodulatory impact of TBS on the prefrontal cortex. Brainstorm's extensive feature set greatly supports the exploration of such neurological data. Future research directions could concentrate on conducting source localization analyses and comparative group studies.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
AdaMR: Adaptable Molecular Representation for Unified Pre-training Strategy
Authors:
Yan Ding,
Hao Cheng,
Ziliang Ye,
Ruyi Feng,
Wei Tian,
Peng Xie,
Juan Zhang,
Zhongze Gu
Abstract:
We propose Adjustable Molecular Representation (AdaMR), a new large-scale uniform pre-training strategy for small-molecule drugs, as a novel unified pre-training strategy. AdaMR utilizes a granularity-adjustable molecular encoding strategy, which is accomplished through a pre-training job termed molecular canonicalization, setting it apart from recent large-scale molecular models. This adaptabilit…
▽ More
We propose Adjustable Molecular Representation (AdaMR), a new large-scale uniform pre-training strategy for small-molecule drugs, as a novel unified pre-training strategy. AdaMR utilizes a granularity-adjustable molecular encoding strategy, which is accomplished through a pre-training job termed molecular canonicalization, setting it apart from recent large-scale molecular models. This adaptability in granularity enriches the model's learning capability at multiple levels and improves its performance in multi-task scenarios. Specifically, the substructure-level molecular representation preserves information about specific atom groups or arrangements, influencing chemical properties and functionalities. This proves advantageous for tasks such as property prediction. Simultaneously, the atomic-level representation, combined with generative molecular canonicalization pre-training tasks, enhances validity, novelty, and uniqueness in generative tasks. All of these features work together to give AdaMR outstanding performance on a range of downstream tasks. We fine-tuned our proposed pre-trained model on six molecular property prediction tasks (MoleculeNet datasets) and two generative tasks (ZINC250K datasets), achieving state-of-the-art (SOTA) results on five out of eight tasks.
△ Less
Submitted 27 April, 2024; v1 submitted 28 December, 2023;
originally announced January 2024.
-
Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph
Authors:
Haoyu Cheng,
Mobin Asri,
Julian Lucas,
Sergey Koren,
Heng Li
Abstract:
Despite recent advances in the length and the accuracy of long-read data, building haplotype-resolved genome assemblies from telomere to telomere still requires considerable computational resources. In this study, we present an efficient de novo assembly algorithm that combines multiple sequencing technologies to scale up population-wide telomere-to-telomere assemblies. By utilizing twenty-two hum…
▽ More
Despite recent advances in the length and the accuracy of long-read data, building haplotype-resolved genome assemblies from telomere to telomere still requires considerable computational resources. In this study, we present an efficient de novo assembly algorithm that combines multiple sequencing technologies to scale up population-wide telomere-to-telomere assemblies. By utilizing twenty-two human and two plant genomes, we demonstrate that our algorithm is around an order of magnitude cheaper than existing methods, while producing better diploid and haploid assemblies. Notably, our algorithm is the only feasible solution to the haplotype-resolved assembly of polyploid genomes.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
De novo reconstruction of satellite repeat units from sequence data
Authors:
Yujie Zhang,
Justin Chu,
Haoyu Cheng,
Heng Li
Abstract:
Satellite DNA are long tandemly repeating sequences in a genome and may be organized as high-order repeats (HORs). They are enriched in centromeres and are challenging to assemble. Existing algorithms for identifying satellite repeats either require the complete assembly of satellites or only work for simple repeat structures without HORs. Here we describe Satellite Repeat Finder (SRF), a new algo…
▽ More
Satellite DNA are long tandemly repeating sequences in a genome and may be organized as high-order repeats (HORs). They are enriched in centromeres and are challenging to assemble. Existing algorithms for identifying satellite repeats either require the complete assembly of satellites or only work for simple repeat structures without HORs. Here we describe Satellite Repeat Finder (SRF), a new algorithm for reconstructing satellite repeat units and HORs from accurate reads or assemblies without prior knowledge on repeat structures. Applying SRF to real sequence data, we showed that SRF could reconstruct known satellites in human and well-studied model organisms. We also found satellite repeats are pervasive in various other species, accounting for up to 12% of their genome contents but are often underrepresented in assemblies. With the rapid progress on genome sequencing, SRF will help the annotation of new genomes and the study of satellite DNA evolution even if such repeats are not fully assembled.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
Learning with augmented target information: An alternative theory of Feedback Alignment
Authors:
Huzi Cheng,
Joshua W. Brown
Abstract:
While error backpropagation (BP) has dominated the training of nearly all modern neural networks for a long time, it suffers from several biological plausibility issues such as the symmetric weight requirement and synchronous updates. Feedback Alignment (FA) was proposed as an alternative to BP to address those dilemmas and has been demonstrated to be effective on various tasks and network archite…
▽ More
While error backpropagation (BP) has dominated the training of nearly all modern neural networks for a long time, it suffers from several biological plausibility issues such as the symmetric weight requirement and synchronous updates. Feedback Alignment (FA) was proposed as an alternative to BP to address those dilemmas and has been demonstrated to be effective on various tasks and network architectures. Despite its simplicity and effectiveness, a satisfying explanation of how FA works across different architectures is still lacking. Here we propose a novel, architecture-agnostic theory of how FA works through the lens of information theory: Instead of approximating gradients calculated by BP with the same parameter, FA learns effective representations by embedding target information into neural networks to be trained. We show this through the analysis of FA dynamics in idealized settings and then via a series of experiments. Based on the implications of this theory, we designed three variants of FA and show their comparable performance on several tasks. These variants also account for some phenomena and theories in neuroscience such as predictive coding and representational drift.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
Metagenome assembly of high-fidelity long reads with hifiasm-meta
Authors:
Xiaowen Feng,
Haoyu Cheng,
Daniel Portik,
Heng Li
Abstract:
Current metagenome assemblers developed for short sequence reads or noisy long readswere not optimized for accurate long reads. Here we describe hifiasm-meta, a new metagenome assembler that exploits the high accuracy of recent data. Evaluated on seven empirical datasets, hifiasm-meta reconstructed tens to hundreds of complete circular bacterial genomes per dataset, consistently outperforming othe…
▽ More
Current metagenome assemblers developed for short sequence reads or noisy long readswere not optimized for accurate long reads. Here we describe hifiasm-meta, a new metagenome assembler that exploits the high accuracy of recent data. Evaluated on seven empirical datasets, hifiasm-meta reconstructed tens to hundreds of complete circular bacterial genomes per dataset, consistently outperforming other metagenome assemblers.
△ Less
Submitted 15 October, 2021;
originally announced October 2021.
-
Robust haplotype-resolved assembly of diploid individuals without parental data
Authors:
Haoyu Cheng,
Erich D. Jarvis,
Olivier Fedrigo,
Klaus-Peter Koepfli,
Lara Urban,
Neil J. Gemmell,
Heng Li
Abstract:
Routine single-sample haplotype-resolved assembly remains an unresolved problem. Here we describe a new algorithm that combines PacBio HiFi reads and Hi-C chromatin interaction data to produce a haplotype-resolved assembly without the sequencing of parents. Applied to human and other vertebrate samples, our algorithm consistently outperforms existing single-sample assembly pipelines and generates…
▽ More
Routine single-sample haplotype-resolved assembly remains an unresolved problem. Here we describe a new algorithm that combines PacBio HiFi reads and Hi-C chromatin interaction data to produce a haplotype-resolved assembly without the sequencing of parents. Applied to human and other vertebrate samples, our algorithm consistently outperforms existing single-sample assembly pipelines and generates assemblies of comparable quality to the best pedigree-based assemblies.
△ Less
Submitted 10 September, 2021;
originally announced September 2021.
-
Spinopelvic Anatomic Parameters Prediction Model of NSLBP based on data mining
Authors:
Hua Cheng
Abstract:
Objective: The purpose of this study is to perform analysis through the low back pain open data set to predict the incidence of non-specific chronic low back pain (NSLBP) to obtain a more accurate and convenient sagittal spinopelvic parameter model. Methods: The logistic regression analysis and multilayer perceptron(MLP) algorithm is used to construct a NSLBP prediction model based on the paramete…
▽ More
Objective: The purpose of this study is to perform analysis through the low back pain open data set to predict the incidence of non-specific chronic low back pain (NSLBP) to obtain a more accurate and convenient sagittal spinopelvic parameter model. Methods: The logistic regression analysis and multilayer perceptron(MLP) algorithm is used to construct a NSLBP prediction model based on the parameters of the spinopelvic parameters from open data source. Results: Degree of spondylolisthesis(DS), Pelvic radius (PR), Sacral slope (SS), Pelvic tilt (PT) are four predictors screened out by regression analysis that have significant predictive power for the risk of NSLBP. The overall accuracy of the equation prediction model is 85.8%.The MLP network algorithm determines that DS is the most powerful predictor of NSLBP through more precise modeling. The model has good predictive ability of 95.2% of accuracy. Conclusions: MLP models play a more accurate role in the construction of predictive models. Computer science is playing a greater role in helping precision medicine clinical research.
△ Less
Submitted 13 September, 2020;
originally announced September 2020.
-
Haplotype-resolved de novo assembly with phased assembly graphs
Authors:
Haoyu Cheng,
Gregory T Concepcion,
Xiaowen Feng,
Haowen Zhang,
Heng Li
Abstract:
Haplotype-resolved de novo assembly is the ultimate solution to the study of sequence variations in a genome. However, existing algorithms either collapse heterozygous alleles into one consensus copy or fail to cleanly separate the haplotypes to produce high-quality phased assemblies. Here we describe hifiasm, a new de novo assembler that takes advantage of long high-fidelity sequence reads to fai…
▽ More
Haplotype-resolved de novo assembly is the ultimate solution to the study of sequence variations in a genome. However, existing algorithms either collapse heterozygous alleles into one consensus copy or fail to cleanly separate the haplotypes to produce high-quality phased assemblies. Here we describe hifiasm, a new de novo assembler that takes advantage of long high-fidelity sequence reads to faithfully represent the haplotype information in a phased assembly graph. Unlike other graph-based assemblers that only aim to maintain the contiguity of one haplotype, hifiasm strives to preserve the contiguity of all haplotypes. This feature enables the development of a graph trio binning algorithm that greatly advances over standard trio binning. On three human and five non-human datasets, including California redwood with a $\sim$30-gigabase hexaploid genome, we show that hifiasm frequently delivers better assemblies than existing tools and consistently outperforms others on haplotype-resolved assembly.
△ Less
Submitted 3 August, 2020;
originally announced August 2020.
-
Fitting IVIM with Variable Projection and Simplicial Optimization
Authors:
Shreyas Fadnavis,
Hamza Farooq,
Maryam Afzali,
Christoph Lenglet,
Tryphon Georgiou,
Hu Cheng,
Sharlene Newman,
Shahnawaz Ahmed,
Rafael Neto Henriques,
Eric Peterson,
Serge Koudoro,
Ariel Rokem,
Eleftherios Garyfallidis
Abstract:
Fitting multi-exponential models to Diffusion MRI (dMRI) data has always been challenging due to various underlying complexities. In this work, we introduce a novel and robust fitting framework for the standard two-compartment IVIM microstructural model. This framework provides a significant improvement over the existing methods and helps estimate the associated diffusion and perfusion parameters…
▽ More
Fitting multi-exponential models to Diffusion MRI (dMRI) data has always been challenging due to various underlying complexities. In this work, we introduce a novel and robust fitting framework for the standard two-compartment IVIM microstructural model. This framework provides a significant improvement over the existing methods and helps estimate the associated diffusion and perfusion parameters of IVIM in an automatic manner. As a part of this work we provide capabilities to switch between more advanced global optimization methods such as simplicial homology (SH) and differential evolution (DE). Our experiments show that the results obtained from this simultaneous fitting procedure disentangle the model parameters in a reduced subspace. The proposed framework extends the seminal work originated in the MIX framework, with improved procedures for multi-stage fitting. This framework has been made available as an open-source Python implementation and disseminated to the community through the DIPY project.
△ Less
Submitted 15 February, 2020; v1 submitted 27 September, 2019;
originally announced October 2019.
-
Short-term effect of hyperbaric exposure on Ventilation: A Control Study of 12m-depth Single No-decompression Dive Experiment
Authors:
Hua Cheng
Abstract:
Objective: To study to what extent or durations of ventilation effect in a single no-decompression dive of 12 meters to a diver. Methods: There are 29 healthy volunteers divers assigned into SCUBA diving of 12m-depth underwater (the Experimental Group, EG)and chamber dive under 2.2 ATA for 20min (the Control Group, CG) matched with the factors of the age,gender,BMI and Forced Vital Capacity (FVC).…
▽ More
Objective: To study to what extent or durations of ventilation effect in a single no-decompression dive of 12 meters to a diver. Methods: There are 29 healthy volunteers divers assigned into SCUBA diving of 12m-depth underwater (the Experimental Group, EG)and chamber dive under 2.2 ATA for 20min (the Control Group, CG) matched with the factors of the age,gender,BMI and Forced Vital Capacity (FVC).Ventilation functions were measured by spirometer before diving and in 1h and 24h of post-hyperbaric exposure. Used independent samples T tests to compare the differences between the EG and CG.Analyzed of variance through repeated measurement data of different time point before or after high pressure exposure by SPSS 20.0. Results: The Inspiratory Reserve Volume(IRV) rises while the Expiratory Reserve Volume(ERV) falls significantly in 1h after high pressure release(p<0.05).So as with the Inspiratory Capacity (IC) and the Vital Capacity (VC) increased accordingly. The Ratio of FEV1.0 to VC (FEV1.0%t) is higher in CG than EG (t=-2.189,p=0.033) due to the change of VC. But the effects did not last for 24 h after high pressure relief. Conclusions: Ventilation is restricted during the 20min of hyperbaric exposure whether under 12m-depth water or in a 2.2ATA hyperbaric chamber. But the effect recovered close to normal within 24 h. But the effect recovered close to normal within 24 h. The extent of restriction of underwater diving is larger than the dry air hyperbaric chamber dive. Higher water medium density, submerged compressing blood volume of lower limbs and raising inertia added by portable underwater breathing apparatus all might be attributable to the ventilation effects.
△ Less
Submitted 11 June, 2017;
originally announced June 2017.
-
RNA-Seq Mapping Errors When Using Incomplete Reference Transcriptomes of Vertebrates
Authors:
Alexis Black Pyrkosz,
Hans Cheng,
C. Titus Brown
Abstract:
Whole transcriptome sequencing is increasingly being used as a functional genomics tool to study non- model organisms. However, when the reference transcriptome used to calculate differential expression is incomplete, significant error in the inferred expression levels can result. In this study, we use simulated reads generated from real transcriptomes to determine the accuracy of read mapping, an…
▽ More
Whole transcriptome sequencing is increasingly being used as a functional genomics tool to study non- model organisms. However, when the reference transcriptome used to calculate differential expression is incomplete, significant error in the inferred expression levels can result. In this study, we use simulated reads generated from real transcriptomes to determine the accuracy of read mapping, and measure the error resulting from using an incomplete transcriptome. We show that the two primary sources of count- ing error are 1) alternative splice variants that share reads and 2) missing transcripts from the reference. Alternative splice variants increase the false positive rate of mapping while incomplete reference tran- scriptomes decrease the true positive rate, leading to inaccurate transcript expression levels. Grouping transcripts by gene or read sharing (similar to mapping to a reference genome) significantly decreases false positives, but only by improving the reference transcriptome itself can the missing transcript problem be addressed. We also demonstrate that employing different mapping software does not yield substantial increases in accuracy on simulated data. Finally, we show that read lengths or insert sizes must increase past 1kb to resolve mapping ambiguity.
△ Less
Submitted 10 March, 2013;
originally announced March 2013.
-
Skeletal Rigidity of Phylogenetic Trees
Authors:
Howard Cheng,
Satyan L. Devadoss,
Brian Li,
Andrej Risteski
Abstract:
Motivated by geometric origami and the straight skeleton construction, we outline a map between spaces of phylogenetic trees and spaces of planar polygons. The limitations of this map is studied through explicit examples, culminating in proving a structural rigidity result.
Motivated by geometric origami and the straight skeleton construction, we outline a map between spaces of phylogenetic trees and spaces of planar polygons. The limitations of this map is studied through explicit examples, culminating in proving a structural rigidity result.
△ Less
Submitted 26 March, 2012;
originally announced March 2012.