Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–19 of 19 results for author: Fan, W

Searching in archive q-bio. Search in all archives.
.
  1. arXiv:2407.02634  [pdf, other

    q-bio.PE

    Inconsistency of parsimony under the multispecies coalescent

    Authors: Daniel Rickert, Wai-Tong Louis Fan, Matthew Hahn

    Abstract: While it is known that parsimony can be statistically inconsistent under certain models of evolution due to high levels of homoplasy, the consistency of parsimony under the multispecies coalescent (MSC) is less well studied. Previous studies have shown the consistency of concatenated parsimony (parsimony applied to concatenated alignments) under the MSC for the rooted 4-taxa case under an infinite… ▽ More

    Submitted 4 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: 19 pages, 8 figures, 1 table (v2: resolved PDF error; removed endfloat)

  2. arXiv:2406.12950  [pdf, other

    q-bio.QM cs.AI cs.CE cs.CL cs.LG

    MolecularGPT: Open Large Language Model (LLM) for Few-Shot Molecular Property Prediction

    Authors: Yuyan Liu, Sirui Ding, Sheng Zhou, Wenqi Fan, Qiaoyu Tan

    Abstract: Molecular property prediction (MPP) is a fundamental and crucial task in drug discovery. However, prior methods are limited by the requirement for a large number of labeled molecules and their restricted ability to generalize for unseen and new tasks, both of which are essential for real-world applications. To address these challenges, we present MolecularGPT for few-shot MPP. From a perspective o… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  3. arXiv:2401.11447  [pdf, other

    cs.LG q-bio.QM

    Sequential Model for Predicting Patient Adherence in Subcutaneous Immunotherapy for Allergic Rhinitis

    Authors: Yin Li, Yu Xiong, Wenxin Fan, Kai Wang, Qingqing Yu, Liping Si, Patrick van der Smagt, Jun Tang, Nutan Chen

    Abstract: Objective: Subcutaneous Immunotherapy (SCIT) is the long-lasting causal treatment of allergic rhinitis (AR). How to enhance the adherence of patients to maximize the benefit of allergen immunotherapy (AIT) plays a crucial role in the management of AIT. This study aims to leverage novel machine learning models to precisely predict the risk of non-adherence of AR patients and related local symptom s… ▽ More

    Submitted 19 July, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

    Comments: Frontiers in Pharmacology, research topic: Methods and Metrics to Measure Medication Adherence

  4. arXiv:2310.12435  [pdf, other

    math.PR q-bio.PE stat.AP

    Correlation of coalescence times in a diploid Wright-Fisher model with recombination and selfing

    Authors: David Kogan, Dimitrios Diamantidis, John Wakeley, Wai-Tong Louis Fan

    Abstract: The correlation among the gene genealogies at different loci is crucial in biology, yet challenging to understand because such correlation depends on many factors including genetic linkage, recombination, natural selection and population structure. Based on a diploid Wright-Fisher model with a single mating type and partial selfing for a constant large population with size $N$, we quantify the com… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: 39 pages, 6 figures

  5. arXiv:2309.10998  [pdf, other

    math.PR math.DS math.FA math.SP q-bio.PE

    Quasi-stationary behavior of the stochastic FKPP equation on the circle

    Authors: Wai-Tong Louis Fan, Oliver Tough

    Abstract: We consider the stochastic Fisher-Kolmogorov-Petrovsky-Piscunov (FKPP) equation on the circle $\mathbb{S}$, \begin{equation*} \partial_t u(t,x) \,= \fracα{2}Δu +β\,u(1-u) + \sqrt{γ\,u(1-u)}\,\dot{W}, \qquad (t,x)\in(0,\infty)\times \mathbb{S}, \end{equation*} where $\dot{W}$ is space-time white noise. While any solution will eventually be absorbed at one of two states, the constant 1 and the con… ▽ More

    Submitted 9 January, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: 41 pages, 3 figures

  6. arXiv:2306.08142  [pdf, other

    math.PR q-bio.PE

    Latent mutations in the ancestries of alleles under selection

    Authors: Wai-Tong Louis Fan, John Wakeley

    Abstract: We consider a single genetic locus with two alleles $A_1$ and $A_2$ in a large haploid population. The locus is subject to selection and two-way, or recurrent, mutation. Assuming the allele frequencies follow a Wright-Fisher diffusion and have reached stationarity, we describe the asymptotic behaviors of the conditional gene genealogy and the latent mutations of a sample with known allele counts,… ▽ More

    Submitted 26 April, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: 47 pages, 3 figures

  7. Patch formation driven by stochastic effects of interaction between viruses and defective interfering particles

    Authors: Qiantong Liang, Johnny Yang, Wai-Tong Louis Fan, Wing-Cheong Lo

    Abstract: Defective interfering particles (DIPs) are virus-like particles that occur naturally during virus infections. These particles are defective, lacking essential genetic materials for replication, but they can interact with the wild-type virus and potentially be used as therapeutic agents. However, the effect of DIPs on infection spread is still unclear due to complicated stochastic effects and nonli… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Journal ref: PLoS Comput Biol 19(10), 2023

  8. arXiv:2111.15411  [pdf

    q-bio.GN

    EndHiC: assemble large contigs into chromosomal-level scaffolds using the Hi-C links from contig ends

    Authors: Sen Wang, Hengchao Wang, Fan Jiang, Anqi Wang, Hangwei Liu, Hanbo Zhao, Boyuan Yang, Dong Xu, Yan Zhang, Wei Fan

    Abstract: Motivation: The application of PacBio HiFi and ultra-long ONT reads have achieved huge progress in the contig-level assembly, but it is still challenging to assemble large contigs into chromosomes with available Hi-C scaffolding software, which all compute the contact value between contigs using the Hi-C links from the whole contig regions. As the Hi-C links of two adjacent contigs concentrate onl… ▽ More

    Submitted 30 November, 2021; originally announced November 2021.

    Comments: 25 pages, 1 figure, 6 supplemental figures, and 6 supplemental Tables

  9. DGL-LifeSci: An Open-Source Toolkit for Deep Learning on Graphs in Life Science

    Authors: Mufei Li, Jinjing Zhou, Jiajing Hu, Wenxuan Fan, Yangkang Zhang, Yaxin Gu, George Karypis

    Abstract: Graph neural networks (GNNs) constitute a class of deep learning methods for graph data. They have wide applications in chemistry and biology, such as molecular property prediction, reaction prediction and drug-target interaction prediction. Despite the interest, GNN-based modeling is challenging as it requires graph data pre-processing and modeling in addition to programming and deep learning. He… ▽ More

    Submitted 27 June, 2021; originally announced June 2021.

  10. arXiv:2103.01026  [pdf

    q-bio.NC

    Modelling brain based on canonical ensemble with functional MRI: A thermodynamic exploration on neural system

    Authors: Chenxi Zhou, Bin Yang, Wenliang Fan, Wei Li

    Abstract: Objective. Modelling is an important way to study the working mechanism of brain. While the characterization and understanding of brain are still inadequate. This study tried to build a model of brain from the perspective of thermodynamics at system level, which brought a new thinking to brain modelling. Approach. Regarding brain regions as systems, voxels as particles, and intensity of signals… ▽ More

    Submitted 27 March, 2021; v1 submitted 26 February, 2021; originally announced March 2021.

    Comments: 27 pages, 3 figures

    MSC Class: 80-10; 82-10; 92B99 ACM Class: I.5.4; I.6.5; J.3

  11. arXiv:2010.14460  [pdf, other

    math.PR cs.CE math.ST q-bio.PE

    Impossibility of phylogeny reconstruction from $k$-mer counts

    Authors: Wai-Tong Louis Fan, Brandon Legried, Sebastien Roch

    Abstract: We consider phylogeny estimation under a two-state model of sequence evolution by site substitution on a tree. In the asymptotic regime where the sequence lengths tend to infinity, we show that for any fixed $k$ no statistically consistent phylogeny estimation is possible from $k$-mer counts over the full leaf sequences alone. Formally, we establish that the joint distribution of $k$-mer counts ov… ▽ More

    Submitted 1 March, 2022; v1 submitted 27 October, 2020; originally announced October 2020.

    Comments: 35 pages, 4 figures

    Journal ref: Annals of applied probability, 2022

  12. arXiv:2005.11625  [pdf, ps, other

    math.PR q-bio.PE

    Impossibility of consistent distance estimation from sequence lengths under the TKF91 model

    Authors: Wai-Tong Louis Fan, Brandon Legried, Sebastien Roch

    Abstract: We consider the problem of distance estimation under the TKF91 model of sequence evolution by insertions, deletions and substitutions on a phylogeny. In an asymptotic regime where the expected sequence lengths tend to infinity, we show that no consistent distance estimation is possible from sequence lengths alone. More formally, we establish that the distributions of pairs of sequence lengths at d… ▽ More

    Submitted 23 May, 2020; originally announced May 2020.

    Comments: 11 pages, 1 figure

    Journal ref: Bulletin of Mathematical Biology. Vol 82 (9), 2020

  13. arXiv:2003.11817  [pdf

    q-bio.GN

    Estimation of genome size using k-mer frequencies from corrected long reads

    Authors: Hengchao Wang, Bo Liu, Yan Zhang, Fan Jiang, Yuwei Ren, Lijuan Yin, Hangwei Liu, Sen Wang, Wei Fan

    Abstract: The third-generation long reads sequencing technologies, such as PacBio and Nanopore, have great advantages over second-generation Illumina sequencing in de novo assembly studies. However, due to the inherent low base accuracy, third-generation sequencing data cannot be used for k-mer counting and estimating genomic profile based on k-mer frequencies. Thus, in current genome projects, second-gener… ▽ More

    Submitted 26 March, 2020; originally announced March 2020.

    Comments: In total, 24 pages include maintext and supplemental. 1 maintext figure, 1 table, 3 supplemental figures, 8 supplemental tables

  14. arXiv:1708.01793  [pdf, ps, other

    math.PR math.NA q-bio.PE

    Stochastic PDEs on graphs as scaling limits of discrete interacting systems

    Authors: Wai-Tong Louis Fan

    Abstract: Stochastic partial differential equations (SPDE) on graphs were introduced by Cerrai and Freidlin [Ann. Inst. Henri Poincaré Probab. Stat. 53 (2017) 865-899]. This class of stochastic equations in infinite dimensions provides a minimal framework for the study of the effective dynamics of much more complex systems. However, how they emerge from microscopic individual-based models is still poorly un… ▽ More

    Submitted 17 November, 2020; v1 submitted 5 August, 2017; originally announced August 2017.

    Comments: 39 pages, 2 figures

    MSC Class: 60K35; 60H15; 92C50

  15. arXiv:1707.05711  [pdf, ps, other

    q-bio.PE cs.CE math.PR math.ST

    Statistically consistent and computationally efficient inference of ancestral DNA sequences in the TKF91 model under dense taxon sampling

    Authors: Wai-Tong Louis Fan, Sebastien Roch

    Abstract: In evolutionary biology, the speciation history of living organisms is represented graphically by a phylogeny, that is, a rooted tree whose leaves correspond to current species and branchings indicate past speciation events. Phylogenies are commonly estimated from molecular sequences, such as DNA sequences, collected from the species of interest. At a high level, the idea behind this inference is… ▽ More

    Submitted 31 July, 2019; v1 submitted 18 July, 2017; originally announced July 2017.

    Comments: Title modified, 31 pages, 2 Figures and 1 table

  16. arXiv:1707.05702  [pdf, ps, other

    math.PR cs.IT math.ST q-bio.PE

    Necessary and sufficient conditions for consistent root reconstruction in Markov models on trees

    Authors: Wai-Tong Louis Fan, Sebastien Roch

    Abstract: We establish necessary and sufficient conditions for consistent root reconstruction in continuous-time Markov models with countable state space on bounded-height trees. Here a root state estimator is said to be consistent if the probability that it returns to the true root state converges to 1 as the number of leaves tends to infinity. We also derive quantitative bounds on the error of reconstruct… ▽ More

    Submitted 1 August, 2019; v1 submitted 18 July, 2017; originally announced July 2017.

    Comments: 30 pages, 3 figures, title of reference [FR] is updated

    MSC Class: 60J25; 60J80; 62B10; 62M05

    Journal ref: Electronic Journal of Probability, Vol. 23 (47), 1-24, 2018

  17. arXiv:1507.00918  [pdf, ps, other

    math.PR q-bio.PE

    Genealogies in Expanding Populations

    Authors: Rick Durrett, Wai-Tong Louis Fan

    Abstract: The goal of this paper is to prove rigorous results for the behavior of genealogies in a one-dimensional long range biased voter model introduced by Hallatschek and Nelson [25]. The first step, which is easily accomplished using results of Mueller and Tribe [38], is to show that when space and time are rescaled correctly, our biased voter model converges to a Wright-Fisher SPDE. A simple extension… ▽ More

    Submitted 13 January, 2016; v1 submitted 3 July, 2015; originally announced July 2015.

    Comments: 40 pages, 1 figure

    Journal ref: Annals of Applied Probability. Vol. 26 (6), 3456-3490, 2016

  18. arXiv:1308.2012  [pdf

    q-bio.GN

    Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects

    Authors: Binghang Liu, Yujian Shi, Jianying Yuan, Xuesong Hu, Hao Zhang, Nan Li, Zhenyu Li, Yanxiang Chen, Desheng Mu, Wei Fan

    Abstract: Background: With the fast development of next generation sequencing technologies, increasing numbers of genomes are being de novo sequenced and assembled. However, most are in fragmental and incomplete draft status, and thus it is often difficult to know the accurate genome size and repeat content. Furthermore, many genomes are highly repetitive or heterozygous, posing problems to current assemble… ▽ More

    Submitted 26 February, 2020; v1 submitted 8 August, 2013; originally announced August 2013.

    Comments: In total, 47 pages include maintext and supplemental. 7 maintext figures, 3 tables, 6 supplemental figures, 5 supplemental tables

  19. arXiv:0708.1598  [pdf, ps, other

    q-bio.GN

    Genomes: at the edge of chaos with maximum information capacity

    Authors: Sing-Guan Kong, Hong-Da Chen, Wen-Lang Fan, Jan Wigger, Andrew Torda, HC Lee

    Abstract: We propose an order index, phi, which quantifies the notion of ``life at the edge of chaos'' when applied to genome sequences. It maps genomes to a number from 0 (random and of infinite length) to 1 (fully ordered) and applies regardless of sequence length. The 786 complete genomic sequences in GenBank were found to have phi values in a very narrow range, 0.037+/-0.027. We show this implies that… ▽ More

    Submitted 12 August, 2007; originally announced August 2007.

    Comments: 4 pages, 3 figures, paper