Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

    Jinko Graham

    Introduction: In genetic epidemiology, log-linear models of population risk may be used to study the effect of genotypes and exposures on the relative risk of a disease. Such models may also include gene-environment interaction terms that... more
    Introduction: In genetic epidemiology, log-linear models of population risk may be used to study the effect of genotypes and exposures on the relative risk of a disease. Such models may also include gene-environment interaction terms that allow the genotypes to modify the effect of the exposure, or equivalently, the exposure to modify the effect of genotypes on the relative risk. When a measured test locus is in linkage disequilibrium with an unmeasured causal locus, exposure-related genetic structure in the population can lead to spurious gene-environment interaction; that is, to apparent gene-environment interaction at the test locus in the absence of true gene-environment interaction at the causal locus. Exposure-related genetic structure occurs when the distributions of exposures and of haplotypes at the test and causal locus both differ across population strata. A case-parent trio design can protect inference of genetic main effects from confounding bias due to genetic structur...
    ABSTRACTIn genetic epidemiology, log-linear models of population risk may be used to study the effect of genotypes and exposures on the relative risk of a disease. Such models may also include gene-environment interaction terms that allow... more
    ABSTRACTIn genetic epidemiology, log-linear models of population risk may be used to study the effect of genotypes and exposures on the relative risk of a disease. Such models may also include gene-environment interaction terms that allow the genotypes to modify the effect of the exposure, or equivalently, the exposure to modify the effect of genotypes on the relative risk. When a measured test locus is in linkage disequilibrium with an unmeasured causal locus, exposure-related genetic structure in the population can lead to spurious gene-environment interaction; that is, to apparent gene-environment interaction at the test locus in the absence of true gene-environment interaction at the causal locus. Exposure-related genetic structure occurs when the distributions of exposures and of haplotypes at the test and causal locus both differ across population strata. A case-parent trio design can protect inference of genetic main effects from confounding bias due to genetic structure in t...
    This repository contains the code used to simulate an exome-sequencing study of 150 families ascertained to contain at least four members affected with lymphoid cancer. The code is contained in RMarkdown documents that detail how to... more
    This repository contains the code used to simulate an exome-sequencing study of 150 families ascertained to contain at least four members affected with lymphoid cancer. The code is contained in RMarkdown documents that detail how to generate the datasets in the Zenodo repository at https://zenodo.org/record/5797035. The software tools used are R , SLiM and shell scripts.
    Non UBCUnreviewedAuthor affiliation: Simon Fraser UniversityFacult
    SimRVPedigree Supplement. This is a pdf file that provides detailed information about the simulation procedure, as well as additional information for the applications discussed in the main text. (PDF 254 kb)
    approximate exact conditional inference for logistic regression models. Exact conditional inference is based on the distribution of the sufficient statistics for the parameters of interest given the sufficient statistics for the remaining... more
    approximate exact conditional inference for logistic regression models. Exact conditional inference is based on the distribution of the sufficient statistics for the parameters of interest given the sufficient statistics for the remaining nuisance parameters. Using model formula notation, users specify a logistic model and model terms of interest for exact inference. License GPL (> = 2)
    Linkage analysis maps genetic loci for a heritable trait by identifying genomic regions with excess relatedness among individuals with similar trait values. Analysis may be conducted on related individuals from families, or on samples of... more
    Linkage analysis maps genetic loci for a heritable trait by identifying genomic regions with excess relatedness among individuals with similar trait values. Analysis may be conducted on related individuals from families, or on samples of unrelated individuals from a population. For allelically heterogeneous traits, population-based linkage analysis can be more powerful than genotypic-association analysis. Here, we focus on linkage analysis in a population sample, but use sequences rather than individuals as our unit of observation. Earlier investigations of sequence-based linkage mapping relied on known sequence relatedness, whereas we infer relatedness from the sequence data. We propose two ways to associate similarity in relatedness of sequences with similarity in their trait values and compare the resulting linkage methods to two genotypic-association methods. We also introduce a procedure to label case sequences as potential carriers or non-carriers of causal variants after an a...
    Objectives: Chronic primary vasculitis describes a group of complex and rare diseases that are characterized by blood vessel inflammation. Classification of vasculitis subtypes is based predominantly on the size of the involved vessels... more
    Objectives: Chronic primary vasculitis describes a group of complex and rare diseases that are characterized by blood vessel inflammation. Classification of vasculitis subtypes is based predominantly on the size of the involved vessels and clinical phenotype. There is a recognized need to improve classification, especially for small-to-medium sized vessel vasculitides, that, ideally, is based on the underlying biology with a view to informing treatment.Methods: We performed RNA-Seq on blood samples from children (n = 41) and from adults (n = 11) with small-to-medium sized vessel vasculitis, and used unsupervised hierarchical clustering of gene expression patterns in combination with clinical metadata to define disease subtypes.Results: Differential gene expression at the time of diagnosis separated patients into two primary endotypes that differed in the expression of ~3,800 genes in children, and ~1,600 genes in adults. These endotypes were also present during disease flares, and b...
    Background A perfect phylogeny is a rooted binary tree that recursively partitions sequences. The nested partitions of a perfect phylogeny provide insight into the pattern of ancestry of genetic sequence data. For example, sequences may... more
    Background A perfect phylogeny is a rooted binary tree that recursively partitions sequences. The nested partitions of a perfect phylogeny provide insight into the pattern of ancestry of genetic sequence data. For example, sequences may cluster together in a partition indicating that they arise from a common ancestral haplotype. Results We present an R package to reconstruct the local perfect phylogenies underlying a sample of binary sequences. The package enables users to associate the reconstructed partitions with a user-defined partition. We describe and demonstrate the major functionality of the package. Conclusion The package should be of use to researchers seeking insight into the ancestral structure of their sequence data. The reconstructed partitions have many applications, including the mapping of trait-influencing variants.
    Summary We present the R package SimRVSequences to simulate sequence data for pedigrees. SimRVSequences allows for simulations of large numbers of SNVs and scales well with increasing numbers of pedigrees. Users provide a sample of... more
    Summary We present the R package SimRVSequences to simulate sequence data for pedigrees. SimRVSequences allows for simulations of large numbers of SNVs and scales well with increasing numbers of pedigrees. Users provide a sample of pedigrees and single-nucleotide variant data from a sample of unrelated individuals. Availability and Implementation SimRVSequences is publicly-available on CRAN https://cran.r-project.org/web/packages/SimRVSequences/. Supplementary information Supplementary materials are available at Bioinformatics online.
    Background: Studies that ascertain families containing multiple relatives affected by disease can be useful for identification of causal, rare variants from next-generation sequencing data.Results: We present the R package SimRVPedigree,... more
    Background: Studies that ascertain families containing multiple relatives affected by disease can be useful for identification of causal, rare variants from next-generation sequencing data.Results: We present the R package SimRVPedigree, which allows researchers to simulate pedigrees ascertained on the basis of multiple, affected relatives. By incorporating the ascertainment process in the simulation, SimRVPedigree allows researchers to better understand the within-family patterns of relationship amongst affected individuals and ages of disease onset.Conclusions: Through simulation, we show that affected members of a family segregating a rare disease variant tend to be more numerous and cluster in relationships more closely than those for sporadic disease. We also show that the family ascertainment process can lead to apparent anticipation in the age of onset. Finally, we use simulation to gain insight into the limit on the proportion of ascertained families segregating a causal var...
    Background and Aims: Many methods can detect trait association with causal variants in candidate genomic regions; however, a comparison of their ability to localize causal variants is lacking. We extend a previous study of the detection... more
    Background and Aims: Many methods can detect trait association with causal variants in candidate genomic regions; however, a comparison of their ability to localize causal variants is lacking. We extend a previous study of the detection abilities of these methods to a comparison of their localization abilities. Methods: Through coalescent simulation, we compare several popular association methods. Cases and controls are sampled from a diploid population to mimic human studies. As benchmarks for comparison, we include two methods that cluster phenotypes on the true genealogical trees, a naive Mantel test considered previously in haploid populations and an extension that takes into account whether case haplotypes carry a causal variant. We first work through a simulated dataset to illustrate the methods. We then perform a simulation study to score the localization and detection properties. Results: In our simulations, the association signal was localized least precisely by the naive M...
    Both genetic variants and brain region abnormalities are recognized to play a role in cognitive decline. We explore the association between single-nucleotide polymorphisms (SNPs) in linkage regions for Alzheimer's disease and rates of... more
    Both genetic variants and brain region abnormalities are recognized to play a role in cognitive decline. We explore the association between single-nucleotide polymorphisms (SNPs) in linkage regions for Alzheimer's disease and rates of decline in brain structure using data from the Alzheimer's Disease Neuroimaging Initiative (ADNI). In an initial discovery stage, we assessed the presence of linear association between the minor allele counts of 75,845 SNPs in the Alzgene linkage regions and predicted rates of change in structural MRI measurements for 56 brain regions using an RV test. In a second, refinement stage, we reduced the number of SNPs using a bootstrap enhanced sparse canonical correlation analysis (SCCA) with a fixed tuning parameter. Each SNP was assigned an importance measure proportional to the number of times it was estimated to have a nonzero coefficient in repeated re-sampling from the ADNI-1 sample. We created refined lists of SNPs based on importance probabi...
    We studied 140 families with two or more lymphoid cancers, including non-Hodgkin lymphoma (NHL), Hodgkin lymphoma (HL), chronic lymphocytic leukemia (CLL), and multiple myeloma (MM), for deviation from the population age of onset and... more
    We studied 140 families with two or more lymphoid cancers, including non-Hodgkin lymphoma (NHL), Hodgkin lymphoma (HL), chronic lymphocytic leukemia (CLL), and multiple myeloma (MM), for deviation from the population age of onset and lymphoid cancer co-occurrence patterns. Median familial NHL, HL, CLL and MM ages of onset are substantially earlier than comparable population data. NHL, HL and CLL (but not MM) also show earlier age of onset in later generations, known as anticipation. The co-occurrence of lymphoid cancers is significantly different from that expected based on population frequencies (p < .0001), and the pattern differs more in families with more affected members (p < .0001), suggesting specific lymphoid cancer combinations have a shared genetic basis. These families provide evidence for inherited factors that increase the risk of multiple lymphoid cancers. This study was approved by the BC Cancer Agency - University of British Columbia Clinical Research Ethics Bo...
    Thesis (Ph. D.)--University of Washington, 1998 Genetic linkage studies based on pedigree data have limited resolution, due to the relatively small number of segregations. Disequilibrium mapping, which uses population associations to... more
    Thesis (Ph. D.)--University of Washington, 1998 Genetic linkage studies based on pedigree data have limited resolution, due to the relatively small number of segregations. Disequilibrium mapping, which uses population associations to infer the location of a disease mutation, provides one possible strategy for narrowing the candidate region. We develop a coalescent model for the ancestry of a random sample of disease alleles, and use it to investigate population association as a tool for fine-mapping a rare disease. Recombination events may be placed on the ancestral coalescent, and define the recombinant classes, the sets of sampled disease alleles descending from the meiosis at which a given recombination occurred. All disease haplotypes within a recombinant class are identical by descent at the marker. This identity by descent underlies linkage disequilibrium, the allelic association that is due to genetic linkage. We first investigate factors influencing marker identity by descen...
    In case-control studies, covariate information often is col-lected on a genetic factor and a continuous attribute such as age. In some instances, it is reasonable to assume the at-tribute and genetic factor occur independently in the... more
    In case-control studies, covariate information often is col-lected on a genetic factor and a continuous attribute such as age. In some instances, it is reasonable to assume the at-tribute and genetic factor occur independently in the pop-ulation. Under this independence assumption, we develop maximum likelihood estimators of parameters in a logistic model of disease risk. Estimates are based on data from both patients and controls and may be obtained by fit-ting a polychotomous regression model of joint disease and genetic status. Our results extend previous log-linear ap-proaches to imposing independence between a genetic factor and a categorical attribute, thereby avoiding potential loss of information from discretizing a continuous attribute. In this paper, we apply the method to investigate age-specific associations between type 1 diabetes and a variant of the glutamate-cysteine ligase catalytic subunit. The results are compared to those obtained from a standard logistic regres-...
    Standard population genetic theory says that deleterious genetic variants are likely rare and fairly recently introduced. However, can this expectation lead to more powerful tests of association between diseases and rare genetic... more
    Standard population genetic theory says that deleterious genetic variants are likely rare and fairly recently introduced. However, can this expectation lead to more powerful tests of association between diseases and rare genetic variation? The gene genealogy describes the relationships between haplotypes sampled from the general population. Although ancestral tree-based methods, inspired by the gene genealogy concept, have been developed for finding associations with common genetic variants, here we ask whether gene genealogies can help in identifying genomic regions containing multiple rare causal variants. With data simulated under several demographic models and using known gene genealogies, we developed and compared several tree-based statistics to determine which, if any, could detect the type of clustering expected with rare causal variants and whether the genealogic tree provides additional information about disease associations. We found that a novel statistic based on the sc...
    A gene genealogy describes relationships among haplotypes sampled from a population. Knowledge of the gene genealogy for a set of haplotypes is useful for estimation of population genetic parameters and it also has potential application... more
    A gene genealogy describes relationships among haplotypes sampled from a population. Knowledge of the gene genealogy for a set of haplotypes is useful for estimation of population genetic parameters and it also has potential application in finding disease-predisposing genetic variants. As the true gene genealogy is unknown, Markov chain Monte Carlo (MCMC) approaches have been used to sample genealogies conditional on data at multiple genetic markers. We previously implemented an MCMC algorithm to sample from an approximation to the distribution of the gene genealogy conditional on haplotype data. Our approach samples ancestral trees, recombination and mutation rates at a genomic focal point. In this work, we describe how our sampler can be used to find disease-predisposing genetic variants in samples of cases and controls. We use a tree-based association statistic that quantifies the degree to which case haplotypes are more closely related to each other around the focal point than c...
    In the case-parent trio design, unrelated children affected with a disease are genotyped along with their parents. Information may also be collected on environmental factors in the children. The design permits estimation and testing of... more
    In the case-parent trio design, unrelated children affected with a disease are genotyped along with their parents. Information may also be collected on environmental factors in the children. The design permits estimation and testing of genetic effects and gene-by-environment interaction. Recently, it has been demonstrated that when genotypes are measured at a non-causal test locus, population stratification can create spurious interaction. That is, the environmental factor can appear to modify the disease risk associated with genotypes at the test locus without modifying the disease risk of genotypes at the causal locus. One design-based approach that is robust to spurious interaction requires the environmental factor to also be available on an unaffected sibling of the affected child. We explore the source of spurious interaction and suggest an alternate approach that mitigates its effects using case-parent triads. Our approach is based on adjusting the risk model using ancestry in...
    In disequilibrium mapping from data on a rare allele, interest may focus on the ancestry of a random sample of current descendants of a mutation. The mutation is assumed to have been introduced into the population as a single copy a known... more
    In disequilibrium mapping from data on a rare allele, interest may focus on the ancestry of a random sample of current descendants of a mutation. The mutation is assumed to have been introduced into the population as a single copy a known time ago and to have reached a given copy number within the population. Theory has been developed to describe the ancestral distribution under arbitrary patterns of population expansion. Further results permit convenient realization of the ancestry for a random sample of copies of a rare allele within populations of constant size or within populations growing or shrinking at constant exponential rate. In this article, we present an efficient approximate method for realizing coalescence times under more general patterns of population growth. We also apply diagnostics, checking the age of the mutation. In the course of the derivation, some additional insight is gained into the dynamics of the descendants of the mutation.
    Modern forensic DNA profiles are constructed using microsatellites, short tandem repeats of 2-5 bases. In the absence of genetic data on a crime-specific subpopulation, one tool for evaluating profile evidence is the match probability.... more
    Modern forensic DNA profiles are constructed using microsatellites, short tandem repeats of 2-5 bases. In the absence of genetic data on a crime-specific subpopulation, one tool for evaluating profile evidence is the match probability. The match probability is the conditional probability that a random person would have the profile of interest given that the suspect has it and that these people are different members of the same subpopulation. One issue in evaluating the match probability is population differentiation, which can induce coancestry among subpopulation members. Forensic assessments that ignore coancestry typically overstate the strength of evidence against the suspect. Theory has been developed to account for coancestry; assumptions include a steady-state population and a mutation model in which the allelic state after a mutation event is independent of the prior state. Under these assumptions, the joint allelic probabilities within a subpopulation may be approximated by...
    Background In population association studies, standard methods of statistical inference assume that study subjects are independent samples. In genetic association studies, it is therefore of interest to diagnose undocumented close... more
    Background In population association studies, standard methods of statistical inference assume that study subjects are independent samples. In genetic association studies, it is therefore of interest to diagnose undocumented close relationships in nominally unrelated study samples. Results We describe the R package CrypticIBDcheck to identify pairs of closely-related subjects based on genetic marker data from single-nucleotide polymorphisms (SNPs). The package is able to accommodate SNPs in linkage disequibrium (LD), without the need to thin the markers so that they are approximately independent in the population. Sample pairs are identified by superposing their estimated identity-by-descent (IBD) coefficients on plots of IBD coefficients for pairs of simulated subjects from one of several common close relationships. Conclusions The methods implemented in CrypticIBDcheck are particularly relevant to candidate-gene association studies, in which dependent SNPs cluster in a relatively ...
    Complex traits result from an interplay between genes and environment. A better understanding of their joint effects can help refine understanding of the epidemiology of the trait. Various tests have been proposed to assess the... more
    Complex traits result from an interplay between genes and environment. A better understanding of their joint effects can help refine understanding of the epidemiology of the trait. Various tests have been proposed to assess the statistical interaction between genes and the environment (
    The gene genealogy is a tree describing the ancestral relationships among genes sampled from unrelated individuals. Knowledge of the tree is useful for inference of population-genetic parameters and has potential application in... more
    The gene genealogy is a tree describing the ancestral relationships among genes sampled from unrelated individuals. Knowledge of the tree is useful for inference of population-genetic parameters and has potential application in gene-mapping. Markov chain Monte Carlo approaches that sample genealogies conditional on observed genetic data typically assume that haplotype data are observed even though commonly-used genotyping technologies provide only unphased genotype data. We have extended our haplotype-based genealogy sampler, sampletrees, to handle unphased genotype data. We use the sampled haplotype configurations as a diagnostic for adequate sampling of the tree space based on the reasoning that if haplotype sampling is restricted, sampling from the tree space will also be restricted. We compare the distributions of sampled haplotypes across multiple runs of sampletrees, and to those estimated by the phase inference program, PHASE. Performance was excellent for the majority of individuals as shown by the consistency of results across multiple runs. However, for some individuals in some datasets, sampletrees had problems sampling haplotype configurations; longer run lengths would be required for these datasets. For many datasets though, we expect that sampletrees will be useful for sampling from the posterior distribution of gene genealogies given unphased genotype data.
    In genetic association studies, there is increasing interest in understanding the joint effects of genetic and nongenetic factors. For rare diseases, the case-control study is a practical design, and logistic regression is the standard... more
    In genetic association studies, there is increasing interest in understanding the joint effects of genetic and nongenetic factors. For rare diseases, the case-control study is a practical design, and logistic regression is the standard method of inference. However, the power to detect statistical interaction is a concern, even with relatively large samples. Under independence of genetic and nongenetic covariates, improved precision of interaction estimators is possible, but logistic regression does not make use of this assumption and consequently is not statistically efficient. In recent work to improve efficiency, profile likelihood methods have been used to develop semi-parametric inference that incorporates the independence assumption. We describe an alternate derivation of these estimators for rare diseases that is based on classic arguments from case-control inference. These arguments lead to a simplification in the variance estimator. We also describe a strategy for relaxing t...
    We describe a statistical approach to predict gender-labeling errors in candidate-gene association studies, when Y-chromosome markers have not been included in the genotyping set. The approach adds value to methods that consider only the... more
    We describe a statistical approach to predict gender-labeling errors in candidate-gene association studies, when Y-chromosome markers have not been included in the genotyping set. The approach adds value to methods that consider only the heterozygosity of X-chromosome SNPs, by incorporating available information about the intensity of X-chromosome SNPs in candidate genes relative to autosomal SNPs from the same individual. To our knowledge, no published methods formalize a framework in which heterozygosity and relative intensity are simultaneously taken into account. Our method offers the advantage that, in the genotyping set, no additional space is required beyond that already assigned to X-chromosome SNPs in the candidate genes. We also show how the predictions can be used in a two-phase sampling design to estimate the gender-labeling error rates for an entire study, at a fraction of the cost of a conventional design.
    Recently, Lake et al. [Human Heredity 2003;55:56-65] have proposed an approach based on the EM algorithm for maximum-likelihood inference of trait associations with haplotypes and environmental cofactors in generalized linear models. In... more
    Recently, Lake et al. [Human Heredity 2003;55:56-65] have proposed an approach based on the EM algorithm for maximum-likelihood inference of trait associations with haplotypes and environmental cofactors in generalized linear models. In this short report, we describe an extension to accommodate missing SNP genotype information. We also discuss differences in the calculation of standard errors between their implementation and our own. Finally, we present results indicating that inference is robust to low levels of dependence between haplotypes and nongenetic factors, but that biased inference can result when there is moderate to strong dependence. Overall, the method is found to perform well in the models we considered.

    And 23 more