Drosophila melanogaster females have two X chromosomes and two autosome sets (XX;AA), while males... more Drosophila melanogaster females have two X chromosomes and two autosome sets (XX;AA), while males have a single X chromosome and two autosome sets (X;AA). Drosophila male somatic cells compensate for a single copy of the X chromosome by deploying male-specific-lethal (MSL) complexes that increase transcription from the X chromosome. Male germ cells lack MSL complexes, indicating that either germline X-chromosome dosage compensation is MSL-independent, or that germ cells do not carry out dosage compensation. To investigate whether dosage compensation occurs in germ cells, we directly assayed X-chromosome transcripts using DNA microarrays and show equivalent expression in XX;AA and X;AA germline tissues. In X;AA germ cells, expression from the single X chromosome is about twice that of a single autosome. This mechanism ensures balanced X-chromosome expression between the sexes and, more importantly, it ensures balanced expression between the single X chromosome and the autosome set. O...
Although the relationship between exocytosis and calcium is fundamental both to synaptic and nonn... more Although the relationship between exocytosis and calcium is fundamental both to synaptic and nonneuronal secretory function, analysis is problematic because of the temporal and spatial properties of calcium, and the fact that vesicle transport, priming, retrieval, and recycling are coupled. By analyzing the kinetics of sea urchin egg secretory vesicle exocytosis in vitro, the final steps of exocytosis are resolved. These steps are modeled as a three-state system: activated, committed, and fused, where interstate transitions are given by the probabilities that an active fusion complex commits (α) and that a committed fusion complex results in fusion, p. The number of committed complexes per vesicle docking site is Poisson distributed with mean n. Experimentally, p and n increase with increasing calcium, whereas α and the pn ratio remain constant, reducing the kinetic description to only one calcium-dependent, controlling variable, n. On average, the calcium dependence of the maximum ...
Machine learning approaches are an attractive option for analyzing large-scale data to detect gen... more Machine learning approaches are an attractive option for analyzing large-scale data to detect genetic variants that contribute to variation of a quantitative trait, without requiring specific distributional assumptions. We evaluate two machine learning methods, random forests and logic regression, and compare them to standard simple univariate linear regression, using the Genetic Analysis Workshop 17 mini-exome data. We also apply these methods after collapsing multiple rare variants within genes and within gene pathways. Linear regression and the random forest method performed better when rare variants were collapsed based on genes or gene pathways than when each variant was analyzed separately. Logic regression performed better when rare variants were collapsed based on genes rather than on pathways.
SummaryBackground: Most machine learning approaches only provide a classification for binary resp... more SummaryBackground: Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem.Objectives: The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities.Methods: Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosi...
Identifying gene-gene interactions is essential to understand disease susceptibility and to detec... more Identifying gene-gene interactions is essential to understand disease susceptibility and to detect genetic architectures underlying complex diseases. Here, we aimed at developing a permutation-based methodology relying on a machine learning method, random forest (RF), to detect gene-gene interactions. Our approach called permuted random forest (pRF) which identified the top interacting single nucleotide polymorphism (SNP) pairs by estimating how much the power of a random forest classification model is influenced by removing pairwise interactions. We systematically tested our approach on a simulation study with datasets possessing various genetic constraints including heritability, number of SNPs, sample size, etc. Our methodology showed high success rates for detecting the interaction SNP pair. We also applied our approach to two bladder cancer datasets, which showed consistent results with well-studied methodologies, such as multifactor dimensionality reduction (MDR) and statistic...
Journal of The American Statistical Association - J AMER STATIST ASSN, 1982
Given one or more groups of multivariate normal samples, methods are presented for forming simult... more Given one or more groups of multivariate normal samples, methods are presented for forming simultaneous confidence intervals of all ratios of linear forms of the mean vectors. The methods cover the cases of equal or unequal covariances. In a simultaneous inference context, they are multivariate extensions of a method due to Fieller (1954) for estimation of the ratio of two normal means.
Drosophila melanogaster females have two X chromosomes and two autosome sets (XX;AA), while males... more Drosophila melanogaster females have two X chromosomes and two autosome sets (XX;AA), while males have a single X chromosome and two autosome sets (X;AA). Drosophila male somatic cells compensate for a single copy of the X chromosome by deploying male-specific-lethal (MSL) complexes that increase transcription from the X chromosome. Male germ cells lack MSL complexes, indicating that either germline X-chromosome dosage compensation is MSL-independent, or that germ cells do not carry out dosage compensation. To investigate whether dosage compensation occurs in germ cells, we directly assayed X-chromosome transcripts using DNA microarrays and show equivalent expression in XX;AA and X;AA germline tissues. In X;AA germ cells, expression from the single X chromosome is about twice that of a single autosome. This mechanism ensures balanced X-chromosome expression between the sexes and, more importantly, it ensures balanced expression between the single X chromosome and the autosome set. O...
Although the relationship between exocytosis and calcium is fundamental both to synaptic and nonn... more Although the relationship between exocytosis and calcium is fundamental both to synaptic and nonneuronal secretory function, analysis is problematic because of the temporal and spatial properties of calcium, and the fact that vesicle transport, priming, retrieval, and recycling are coupled. By analyzing the kinetics of sea urchin egg secretory vesicle exocytosis in vitro, the final steps of exocytosis are resolved. These steps are modeled as a three-state system: activated, committed, and fused, where interstate transitions are given by the probabilities that an active fusion complex commits (α) and that a committed fusion complex results in fusion, p. The number of committed complexes per vesicle docking site is Poisson distributed with mean n. Experimentally, p and n increase with increasing calcium, whereas α and the pn ratio remain constant, reducing the kinetic description to only one calcium-dependent, controlling variable, n. On average, the calcium dependence of the maximum ...
Machine learning approaches are an attractive option for analyzing large-scale data to detect gen... more Machine learning approaches are an attractive option for analyzing large-scale data to detect genetic variants that contribute to variation of a quantitative trait, without requiring specific distributional assumptions. We evaluate two machine learning methods, random forests and logic regression, and compare them to standard simple univariate linear regression, using the Genetic Analysis Workshop 17 mini-exome data. We also apply these methods after collapsing multiple rare variants within genes and within gene pathways. Linear regression and the random forest method performed better when rare variants were collapsed based on genes or gene pathways than when each variant was analyzed separately. Logic regression performed better when rare variants were collapsed based on genes rather than on pathways.
SummaryBackground: Most machine learning approaches only provide a classification for binary resp... more SummaryBackground: Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem.Objectives: The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities.Methods: Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosi...
Identifying gene-gene interactions is essential to understand disease susceptibility and to detec... more Identifying gene-gene interactions is essential to understand disease susceptibility and to detect genetic architectures underlying complex diseases. Here, we aimed at developing a permutation-based methodology relying on a machine learning method, random forest (RF), to detect gene-gene interactions. Our approach called permuted random forest (pRF) which identified the top interacting single nucleotide polymorphism (SNP) pairs by estimating how much the power of a random forest classification model is influenced by removing pairwise interactions. We systematically tested our approach on a simulation study with datasets possessing various genetic constraints including heritability, number of SNPs, sample size, etc. Our methodology showed high success rates for detecting the interaction SNP pair. We also applied our approach to two bladder cancer datasets, which showed consistent results with well-studied methodologies, such as multifactor dimensionality reduction (MDR) and statistic...
Journal of The American Statistical Association - J AMER STATIST ASSN, 1982
Given one or more groups of multivariate normal samples, methods are presented for forming simult... more Given one or more groups of multivariate normal samples, methods are presented for forming simultaneous confidence intervals of all ratios of linear forms of the mean vectors. The methods cover the cases of equal or unequal covariances. In a simultaneous inference context, they are multivariate extensions of a method due to Fieller (1954) for estimation of the ratio of two normal means.
Uploads
Papers by James Malley