Association tests based on multi-marker haplotypes may be more powerful than those based on singl... more Association tests based on multi-marker haplotypes may be more powerful than those based on single markers. The existing association tests based on multi-marker haplotypes include Pearson's chi2 test which tests for the difference of haplotype distributions in cases and controls, and haplotype-similarity based methods which compare the average similarity among cases with that of the controls. In this article, we propose new association tests based on haplotype similarities. These new tests compare the average similarities within cases and controls with the average similarity between cases and controls. These methods can be applied to either phase-known or phase-unknown data. We compare the performance of the proposed methods with Pearson's chi2 test and the existing similarity-based tests by simulation studies under a variety of scenarios and by analyzing a real data set. The simulation results show that, in most cases, the new proposed methods are more powerful than both Pearson's chi2 test and the existing similarity-based tests. In one extreme case where the disease mutant induced at a very rare haplotype (<or=3%), Pearson's chi2 is slightly more powerful than the new proposed methods, and in this case, the existing similarity-based tests have almost no power. In another extreme case where the disease mutant was introduced at the most common haplotype, the existing similarity-based methods are slightly more powerful than the new proposed methods, and in this case Pearson's chi2 test is least powerful. The results of real data analysis are consistent with that of our simulation studies.
Rheumatoid arthritis is inherited in a complex manner. So far several single susceptibility genes... more Rheumatoid arthritis is inherited in a complex manner. So far several single susceptibility genes, such as PTPN22, STAT4, and TRAF1-C5, have been identified. However, it is presumed that some genes may interact to have a significant effect on the disease, while each of them only plays a modest role. We propose a new combinatorial association test to detect the gene-gene interaction in the rheumatoid arthritis data using multiple traits: disease status, anti-cyclic citrullinated peptide, and immunoglobulin M. Existing gene-gene interaction tests only use the information on a single trait at a time. In this article, we propose a new multivariate combinatorial searching method that utilizes multiple traits at the same time. Multivariate combinatorial searching method is conducted by incorporating the multiple traits with various techniques of feature selection to search for a set of disease-susceptibility genes that may interact. By analyzing three panels of markers, we have identified...
Association tests based on multi-marker haplotypes may be more powerful than those based on singl... more Association tests based on multi-marker haplotypes may be more powerful than those based on single markers. The existing association tests based on multi-marker haplotypes include Pearson's chi2 test which tests for the difference of haplotype distributions in cases and controls, and haplotype-similarity based methods which compare the average similarity among cases with that of the controls. In this article, we propose new association tests based on haplotype similarities. These new tests compare the average similarities within cases and controls with the average similarity between cases and controls. These methods can be applied to either phase-known or phase-unknown data. We compare the performance of the proposed methods with Pearson's chi2 test and the existing similarity-based tests by simulation studies under a variety of scenarios and by analyzing a real data set. The simulation results show that, in most cases, the new proposed methods are more powerful than both Pearson's chi2 test and the existing similarity-based tests. In one extreme case where the disease mutant induced at a very rare haplotype (<or=3%), Pearson's chi2 is slightly more powerful than the new proposed methods, and in this case, the existing similarity-based tests have almost no power. In another extreme case where the disease mutant was introduced at the most common haplotype, the existing similarity-based methods are slightly more powerful than the new proposed methods, and in this case Pearson's chi2 test is least powerful. The results of real data analysis are consistent with that of our simulation studies.
For complex diseases, the relationship between genotypes, environment factors, and phenotype is u... more For complex diseases, the relationship between genotypes, environment factors, and phenotype is usually complex and nonlinear. Our understanding of the genetic architecture of diseases has considerably increased over the last years. However, both conceptually and methodologically, detecting gene-gene and gene-environment interactions remains a challenge, despite the existence of a number of efficient methods. One method that offers great promises but has not yet been widely applied to genomic data is the entropy-based approach of information theory. In this article, we first develop entropy-based test statistics to identify two-way and higher order gene-gene and gene-environment interactions. We then apply these methods to a bladder cancer data set and thereby test their power and identify strengths and weaknesses. For two-way interactions, we propose an information gain (IG) approach based on mutual information. For three-ways and higher order interactions, an interaction IG approach is used. In both cases, we develop one-dimensional test statistics to analyze sparse data. Compared to the naive chi-square test, the test statistics we develop have similar or higher power and is robust. Applying it to the bladder cancer data set allowed to investigate the complex interactions between DNA repair gene single nucleotide polymorphisms, smoking status, and bladder cancer susceptibility. Although not yet widely applied, entropy-based approaches appear as a useful tool for detecting gene-gene and gene-environment interactions. The test statistics we develop add to a growing body methodologies that will gradually shed light on the complex architecture of common diseases.
Recent advances in molecular technologies have resulted in the ability to screen hundreds of thou... more Recent advances in molecular technologies have resulted in the ability to screen hundreds of thousands of single nucleotide polymorphisms and tens of thousands of gene expression profiles. While these data have the potential to inform investigations into disease etiologies and advance medicine, the question of how to adequately control both type I and type II error rates remains. Genetic Analysis Workshop 15 datasets provided a unique opportunity for participants to evaluate multiple testing strategies applicable to microarray and single nucleotide polymorphism data. The Genetic Analysis Workshop 15 multiple testing and false discovery rate group (Group 15) investigated three general categories for multiple testing corrections, which are summarized in this review: statistical independence, error rate adjustment, and data reduction. We show that while each approach may have certain advantages, adequate error control is largely dependent upon the question under consideration and often requires the use of multiple analytic strategies.
Association tests based on multi-marker haplotypes may be more powerful than those based on singl... more Association tests based on multi-marker haplotypes may be more powerful than those based on single markers. The existing association tests based on multi-marker haplotypes include Pearson's chi2 test which tests for the difference of haplotype distributions in cases and controls, and haplotype-similarity based methods which compare the average similarity among cases with that of the controls. In this article, we propose new association tests based on haplotype similarities. These new tests compare the average similarities within cases and controls with the average similarity between cases and controls. These methods can be applied to either phase-known or phase-unknown data. We compare the performance of the proposed methods with Pearson's chi2 test and the existing similarity-based tests by simulation studies under a variety of scenarios and by analyzing a real data set. The simulation results show that, in most cases, the new proposed methods are more powerful than both Pearson's chi2 test and the existing similarity-based tests. In one extreme case where the disease mutant induced at a very rare haplotype (<or=3%), Pearson's chi2 is slightly more powerful than the new proposed methods, and in this case, the existing similarity-based tests have almost no power. In another extreme case where the disease mutant was introduced at the most common haplotype, the existing similarity-based methods are slightly more powerful than the new proposed methods, and in this case Pearson's chi2 test is least powerful. The results of real data analysis are consistent with that of our simulation studies.
Rheumatoid arthritis is inherited in a complex manner. So far several single susceptibility genes... more Rheumatoid arthritis is inherited in a complex manner. So far several single susceptibility genes, such as PTPN22, STAT4, and TRAF1-C5, have been identified. However, it is presumed that some genes may interact to have a significant effect on the disease, while each of them only plays a modest role. We propose a new combinatorial association test to detect the gene-gene interaction in the rheumatoid arthritis data using multiple traits: disease status, anti-cyclic citrullinated peptide, and immunoglobulin M. Existing gene-gene interaction tests only use the information on a single trait at a time. In this article, we propose a new multivariate combinatorial searching method that utilizes multiple traits at the same time. Multivariate combinatorial searching method is conducted by incorporating the multiple traits with various techniques of feature selection to search for a set of disease-susceptibility genes that may interact. By analyzing three panels of markers, we have identified...
Association tests based on multi-marker haplotypes may be more powerful than those based on singl... more Association tests based on multi-marker haplotypes may be more powerful than those based on single markers. The existing association tests based on multi-marker haplotypes include Pearson's chi2 test which tests for the difference of haplotype distributions in cases and controls, and haplotype-similarity based methods which compare the average similarity among cases with that of the controls. In this article, we propose new association tests based on haplotype similarities. These new tests compare the average similarities within cases and controls with the average similarity between cases and controls. These methods can be applied to either phase-known or phase-unknown data. We compare the performance of the proposed methods with Pearson's chi2 test and the existing similarity-based tests by simulation studies under a variety of scenarios and by analyzing a real data set. The simulation results show that, in most cases, the new proposed methods are more powerful than both Pearson's chi2 test and the existing similarity-based tests. In one extreme case where the disease mutant induced at a very rare haplotype (<or=3%), Pearson's chi2 is slightly more powerful than the new proposed methods, and in this case, the existing similarity-based tests have almost no power. In another extreme case where the disease mutant was introduced at the most common haplotype, the existing similarity-based methods are slightly more powerful than the new proposed methods, and in this case Pearson's chi2 test is least powerful. The results of real data analysis are consistent with that of our simulation studies.
For complex diseases, the relationship between genotypes, environment factors, and phenotype is u... more For complex diseases, the relationship between genotypes, environment factors, and phenotype is usually complex and nonlinear. Our understanding of the genetic architecture of diseases has considerably increased over the last years. However, both conceptually and methodologically, detecting gene-gene and gene-environment interactions remains a challenge, despite the existence of a number of efficient methods. One method that offers great promises but has not yet been widely applied to genomic data is the entropy-based approach of information theory. In this article, we first develop entropy-based test statistics to identify two-way and higher order gene-gene and gene-environment interactions. We then apply these methods to a bladder cancer data set and thereby test their power and identify strengths and weaknesses. For two-way interactions, we propose an information gain (IG) approach based on mutual information. For three-ways and higher order interactions, an interaction IG approach is used. In both cases, we develop one-dimensional test statistics to analyze sparse data. Compared to the naive chi-square test, the test statistics we develop have similar or higher power and is robust. Applying it to the bladder cancer data set allowed to investigate the complex interactions between DNA repair gene single nucleotide polymorphisms, smoking status, and bladder cancer susceptibility. Although not yet widely applied, entropy-based approaches appear as a useful tool for detecting gene-gene and gene-environment interactions. The test statistics we develop add to a growing body methodologies that will gradually shed light on the complex architecture of common diseases.
Recent advances in molecular technologies have resulted in the ability to screen hundreds of thou... more Recent advances in molecular technologies have resulted in the ability to screen hundreds of thousands of single nucleotide polymorphisms and tens of thousands of gene expression profiles. While these data have the potential to inform investigations into disease etiologies and advance medicine, the question of how to adequately control both type I and type II error rates remains. Genetic Analysis Workshop 15 datasets provided a unique opportunity for participants to evaluate multiple testing strategies applicable to microarray and single nucleotide polymorphism data. The Genetic Analysis Workshop 15 multiple testing and false discovery rate group (Group 15) investigated three general categories for multiple testing corrections, which are summarized in this review: statistical independence, error rate adjustment, and data reduction. We show that while each approach may have certain advantages, adequate error control is largely dependent upon the question under consideration and often requires the use of multiple analytic strategies.
Uploads