Hi! My name is Angli Xue. I am a PhD student in University of Queensland. I work at Institute for Molecular Bioscience under the supervision of Prof. Jian Yang. My research interest is investigating the genetic architecture of type 2 diabetes. Supervisors: Prof. Jian Yang and Dr. Jian Zeng Address: https://sites.google.com/view/angli-xue/
Using latent variables in gene expression data can help correct unobserved confounders and increa... more Using latent variables in gene expression data can help correct unobserved confounders and increase statistical power for expression quantitative trait Loci (eQTL) detection. The probabilistic estimation of expression residuals (PEER) and principal component analysis (PCA) are widely used methods that can remove unwanted variation and improve eQTL discovery power in bulk RNA-seq analysis. However, their performance has not been evaluated extensively in single-cell eQTL analysis, especially for different cell types. Potential challenges arise due to the structure of single-cell RNA-seq data, including sparsity, skewness, and mean-variance relationship. Here, we show by a series of analyses that PEER and PCA require additional quality control and data transformation steps on the pseudo-bulk matrix to obtain valid latent variables; otherwise, it can result in highly correlated factors (Pearson's correlation r = 0.63 ~ 0.99). Incorporating valid PFs/PCs in the eQTL association model...
The genetic regulation of post-prandial glucose levels is poorly understood. Here, we characteris... more The genetic regulation of post-prandial glucose levels is poorly understood. Here, we characterise the genetic architecture of blood glucose variably measured within 0 and 24 h of fasting in 368,000 European ancestry participants of the UK Biobank. We found a near-linear increase in the heritability of non-fasting glucose levels over time, which plateaus to its fasting state value after 5 h post meal (h2 = 11%; standard error: 1%). The genetic correlation between different fasting times is > 0.77, suggesting that the genetic control of glucose is largely constant across fasting durations. Accounting for heritability differences between fasting times leads to a ~16% improvement in the discovery of genetic variants associated with glucose. Newly detected variants improve the prediction of fasting glucose and type 2 diabetes in independent samples. Finally, we meta-analysed summary statistics from genome-wide association studies of random and fasting glucose (N = 518,615) and identi...
Additional file 1: Supplementary Figures. Figure S1. Gender and age demographics/distribution of ... more Additional file 1: Supplementary Figures. Figure S1. Gender and age demographics/distribution of UK Biobank derived BCC cases and controls. Figure S2. Regional plots of MC1R. Figure S3. FI network for the protein-coding 'nearest-genes' identified by GWAS analysis.
Additional file 2: Supplementary Tables. Table S1. Independent common variants associated with BC... more Additional file 2: Supplementary Tables. Table S1. Independent common variants associated with BCC in GWAS analysis at P-value < 5E-8. Table S2. Common variants identified by GCTA-COJO analysis of BCC at P < 5E-8 . Table S3. Quantification of the correlation of eQTL effects (r̂ ) between blood and skin samples. Table S4. BCC-associated CpG methylation sites via SMR analysis of the GWAS meta-analysis and mQTL data. Table S5. Mapping the BCC-associated CpG methylation sites to the BCC-associated genes via SMR analysis using the eQTLGen eQTL data. Table S6. BCC GWAS-FI network Pathway Analysis. Table S7. BCC GWAS-FI network GO-Process Analysis. Table S8. BCC SMR-FI network Pathway Analysis. Table S9. BCC SMR-FI network GO-Process Analysis. Table S10. Pubmed search of SMR-HEIDI candidate genes biological processes. Table S11. BCC-associated genes identified via SMR analysis using GTEx and eQTLGen eQTL data. Table S12. Pearson's correlation of GTEx and eQTLGen eQTL data. This e...
We conducted a meta-analysis of genome-wide association studies (GWAS) with ∼16 million genotyped... more We conducted a meta-analysis of genome-wide association studies (GWAS) with ∼16 million genotyped/imputed genetic variants in 62,892 type 2 diabetes (T2D) cases and 596,424 controls of European ancestry. We identified 139 common and 4 rare (minor allele frequency < 0.01) variants associated with T2D, 42 of which (39 common and 3 rare variants) were independent of the known variants. Integration of the gene expression data from blood (n = 14,115 and 2,765) and other T2D-relevant tissues (n = up to 385) with the GWAS results identified 33 putative functional genes for T2D, three of which were targeted by approved drugs. A further integration of DNA methylation (n = 1,980) and epigenomic annotations data highlighted three putative T2D genes (CAMK1D, TP53INP1 and ATP5G1) with plausible regulatory mechanisms whereby a genetic variant exerts an effect on T2D through epigenetic regulation of gene expression. We further found evidence that the T2D-associated loci have been under purifyin...
Genotype-by-environment interaction (GEI) is a fundamental component in understanding complex tra... more Genotype-by-environment interaction (GEI) is a fundamental component in understanding complex trait variation. However, it remains challenging to identify genetic variants with GEI effects in humans largely because of the small effect sizes and the difficulty of monitoring environmental fluctuations. Here, we demonstrate that GEI can be inferred from genetic variants associated with phenotypic variability in a large sample without the need of measuring environmental factors. We performed a genome-wide variance quantitative trait locus (vQTL) analysis of ~5.6 million variants on 348,501 unrelated individuals of European ancestry for 13 quantitative traits in the UK Biobank, and identified 75 significant vQTLs with P<2.0×10-9 for 9 traits, especially for those related to obesity. Direct GEI analysis with five environmental factors showed that the vQTLs were strongly enriched with GEI effects. Our results indicate pervasive GEI effects for obesity-related traits and demonstrate the ...
Observational epidemiological studies have found an association between schizophrenia and breast ... more Observational epidemiological studies have found an association between schizophrenia and breast cancer, but it is not known if the relationship is a causal one. We used summary statistics from very large genome-wide association studies of schizophrenia (n = 40675 cases and 64643 controls) and breast cancer (n = 122977 cases and 105974 controls) to investigate whether there is evidence that the association is partly due to shared genetic risk factors and whether there is evidence of a causal relationship. Using LD-score regression, we found that there is a small but significant genetic correlation (rG) between the 2 disorders (rG = 0.14, SE = 0.03, P = 4.75 × 10–8), indicating shared genetic risk factors. Using 142 genetic variants associated with schizophrenia as instrumental variables that are a proxy for having schizophrenia, we estimated a causal effect of schizophrenia on breast cancer on the observed scale as bxy = 0.032 (SE = 0.009, P = 2.3 × 10–4). A 1 SD increase in liabili...
Using latent variables in gene expression data can help correct spurious correlations due to unob... more Using latent variables in gene expression data can help correct spurious correlations due to unobserved confounders and increase statistical power for expression Quantitative Trait Loci (eQTL) detection. Probabilistic Estimation of Expression Residuals (PEER) is a widely used statistical method that has been developed to remove unwanted variation and improve eQTL discovery power in bulk RNA-seq analysis. However, its performance has not been largely evaluated in single-cell eQTL data analysis, where it is becoming a commonly used technique. Potential challenges arise due to the structure of single-cell data, including sparsity, skewness, and mean-variance relationship. Here, we show by a series of analyses that this method requires additional quality control and data transformation steps on the pseudo-bulk matrix to obtain valid PEER factors. By using a population-scale single-cell cohort (OneK1K, N = 982), we found that generating PEER factors without further QC or transformation o...
GCTA (Genome-wide Complex Trait Analysis) is a software package, which was initially developed to... more GCTA (Genome-wide Complex Trait Analysis) is a software package, which was initially developed to estimate the proportion of phenotypic variance explained by all genome-wide SNPs for a complex trait but has been extensively extended for many other analyses of data from genome-wide association studies (GWASs). Please see the software website through the link below for more information. Software website: https://yanglab.westlake.edu.cn/software/gcta/
Genome-wide association studies (GWAS) have discovered numerous genetic variants associated with ... more Genome-wide association studies (GWAS) have discovered numerous genetic variants associated with human behavioural traits. However, behavioural traits are subject to misreports and longitudinal changes (MLC) which can cause biases in GWAS and follow-up analyses. Here, we demonstrate that individuals with higher disease burden in the UK Biobank (n = 455,607) are more likely to misreport or reduce their alcohol consumption (AC) levels, and propose a correction procedure to mitigate the MLC-induced biases. The AC GWAS signals removed by the MLC corrections are enriched in metabolic/cardiovascular traits. Almost all the previously reported negative estimates of genetic correlations between AC and common diseases become positive/non-significant after the MLC corrections. We also observe MLC biases for smoking and physical activities in the UK Biobank. Our findings provide a plausible explanation of the controversy about the effects of AC on health outcomes and a caution for future analys...
Vitamin D deficiency is a candidate risk factor for a range of adverse health outcomes. In a geno... more Vitamin D deficiency is a candidate risk factor for a range of adverse health outcomes. In a genome-wide association study of 25 hydroxyvitamin D (25OHD) concentration in 417,580 Europeans we identified 143 independent loci in 112 1-Mb regions providing new insights into the physiology of vitamin D and implicating genes involved in (a) lipid and lipoprotein metabolism, (b) dermal tissue properties, and (c) the sulphonation and glucuronidation of 25OHD. Mendelian randomization models found no robust evidence that 25OHD concentration had causal effects on candidate phenotypes (e.g. BMI, psychiatric disorders), but many phenotypes had (direct or indirect) causal effects on 25OHD concentration, clarifying the relationship between 25OHD status and health.
Understanding how natural selection has shaped the genetic architecture of complex traits and dis... more Understanding how natural selection has shaped the genetic architecture of complex traits and diseases is of importance in medical and evolutionary genetics. Bayesian methods have been developed using individual-level data to estimate multiple features of genetic architecture, including signatures of natural selection. Here, we present an enhanced method (SBayesS) that only requires GWAS summary statistics and incorporates functional genomic annotations. We analysed GWAS data with large sample sizes for 155 complex traits and detected pervasive signatures of negative selection with diverse estimates of SNP-based heritability and polygenicity. Projecting these estimates onto a map of genetic architecture obtained from evolutionary simulations revealed relatively strong natural selection on genetic variants associated with cardiorespiratory and cognitive traits and relatively small number of mutational targets for diseases. Averaging across traits, the joint distribution of SNP effect...
Background: Mendelian randomization (MR) is a method for exploring observational associations to ... more Background: Mendelian randomization (MR) is a method for exploring observational associations to find evidence of causality. Objective: To apply MR between multiple risk factors/phenotypic traits (exposures) and Parkinson disease (PD) in a large, unbiased manner, and to create a public resource for research. Methods: We used two-sample MR in which the summary statistics relating to SNPs from genome wide association studies (GWASes) of 5,839 exposures curated on MR-Base were used to assess causal relationships with PD. We selected the highest quality exposure GWASes for this report (n=401). For the disease outcome, summary statistics from the largest published PD GWAS were used. For each exposure, the causal effect on PD was assessed using the inverse variance weighted (IVW) method, followed by a range of sensitivity analyses. We used a false discovery rate (FDR) corrected p-value of <0.05 from the IVW analysis to prioritise traits of interest. Results: We observed evidence for ca...
We performed the largest genome-wide association study of PD to date, involving the analysis of 7... more We performed the largest genome-wide association study of PD to date, involving the analysis of 7.8M SNPs in 37.7K cases, 18.6K UK Biobank proxy-cases, and 1.4M controls. We identified 90 independent genome-wide significant signals across 78 loci, including 38 independent risk signals in 37 novel loci. These variants explained 26-36% of the heritable risk of PD. Tests of causality within a Mendelian randomization framework identified putatively causal genes for 70 risk signals. Tissue expression enrichment analysis suggested that signatures of PD loci were heavily brain-enriched, consistent with specific neuronal cell types being implicated from single cell expression data. We found significant genetic correlations with brain volumes, smoking status, and educational attainment. In sum, these data provide the most comprehensive understanding of the genetic architecture of PD to date by revealing many additional PD risk loci, providing a biological context for these risk factors, and ...
Understanding the difference in genetic regulation of gene expression between brain and blood is ... more Understanding the difference in genetic regulation of gene expression between brain and blood is important for discovering genes for brain-related traits and disorders. Here, we estimate the correlation of genetic effects at the top-associated cis-expression or -DNA methylation (DNAm) quantitative trait loci (cis-eQTLs or cis-mQTLs) between brain and blood (r ). Using publicly available data, we find that genetic effects at the top cis-eQTLs or mQTLs are highly correlated between independent brain and blood samples ([Formula: see text] for cis-eQTLs and [Formula: see text] for cis-mQTLs). Using meta-analyzed brain cis-eQTL/mQTL data (n = 526 to 1194), we identify 61 genes and 167 DNAm sites associated with four brain-related phenotypes, most of which are a subset of the discoveries (97 genes and 295 DNAm sites) using data from blood with larger sample sizes (n = 1980 to 14,115). Our results demonstrate the gain of power in gene discovery for brain-related phenotypes using blood cis-...
Using latent variables in gene expression data can help correct unobserved confounders and increa... more Using latent variables in gene expression data can help correct unobserved confounders and increase statistical power for expression quantitative trait Loci (eQTL) detection. The probabilistic estimation of expression residuals (PEER) and principal component analysis (PCA) are widely used methods that can remove unwanted variation and improve eQTL discovery power in bulk RNA-seq analysis. However, their performance has not been evaluated extensively in single-cell eQTL analysis, especially for different cell types. Potential challenges arise due to the structure of single-cell RNA-seq data, including sparsity, skewness, and mean-variance relationship. Here, we show by a series of analyses that PEER and PCA require additional quality control and data transformation steps on the pseudo-bulk matrix to obtain valid latent variables; otherwise, it can result in highly correlated factors (Pearson's correlation r = 0.63 ~ 0.99). Incorporating valid PFs/PCs in the eQTL association model...
The genetic regulation of post-prandial glucose levels is poorly understood. Here, we characteris... more The genetic regulation of post-prandial glucose levels is poorly understood. Here, we characterise the genetic architecture of blood glucose variably measured within 0 and 24 h of fasting in 368,000 European ancestry participants of the UK Biobank. We found a near-linear increase in the heritability of non-fasting glucose levels over time, which plateaus to its fasting state value after 5 h post meal (h2 = 11%; standard error: 1%). The genetic correlation between different fasting times is > 0.77, suggesting that the genetic control of glucose is largely constant across fasting durations. Accounting for heritability differences between fasting times leads to a ~16% improvement in the discovery of genetic variants associated with glucose. Newly detected variants improve the prediction of fasting glucose and type 2 diabetes in independent samples. Finally, we meta-analysed summary statistics from genome-wide association studies of random and fasting glucose (N = 518,615) and identi...
Additional file 1: Supplementary Figures. Figure S1. Gender and age demographics/distribution of ... more Additional file 1: Supplementary Figures. Figure S1. Gender and age demographics/distribution of UK Biobank derived BCC cases and controls. Figure S2. Regional plots of MC1R. Figure S3. FI network for the protein-coding 'nearest-genes' identified by GWAS analysis.
Additional file 2: Supplementary Tables. Table S1. Independent common variants associated with BC... more Additional file 2: Supplementary Tables. Table S1. Independent common variants associated with BCC in GWAS analysis at P-value < 5E-8. Table S2. Common variants identified by GCTA-COJO analysis of BCC at P < 5E-8 . Table S3. Quantification of the correlation of eQTL effects (r̂ ) between blood and skin samples. Table S4. BCC-associated CpG methylation sites via SMR analysis of the GWAS meta-analysis and mQTL data. Table S5. Mapping the BCC-associated CpG methylation sites to the BCC-associated genes via SMR analysis using the eQTLGen eQTL data. Table S6. BCC GWAS-FI network Pathway Analysis. Table S7. BCC GWAS-FI network GO-Process Analysis. Table S8. BCC SMR-FI network Pathway Analysis. Table S9. BCC SMR-FI network GO-Process Analysis. Table S10. Pubmed search of SMR-HEIDI candidate genes biological processes. Table S11. BCC-associated genes identified via SMR analysis using GTEx and eQTLGen eQTL data. Table S12. Pearson's correlation of GTEx and eQTLGen eQTL data. This e...
We conducted a meta-analysis of genome-wide association studies (GWAS) with ∼16 million genotyped... more We conducted a meta-analysis of genome-wide association studies (GWAS) with ∼16 million genotyped/imputed genetic variants in 62,892 type 2 diabetes (T2D) cases and 596,424 controls of European ancestry. We identified 139 common and 4 rare (minor allele frequency < 0.01) variants associated with T2D, 42 of which (39 common and 3 rare variants) were independent of the known variants. Integration of the gene expression data from blood (n = 14,115 and 2,765) and other T2D-relevant tissues (n = up to 385) with the GWAS results identified 33 putative functional genes for T2D, three of which were targeted by approved drugs. A further integration of DNA methylation (n = 1,980) and epigenomic annotations data highlighted three putative T2D genes (CAMK1D, TP53INP1 and ATP5G1) with plausible regulatory mechanisms whereby a genetic variant exerts an effect on T2D through epigenetic regulation of gene expression. We further found evidence that the T2D-associated loci have been under purifyin...
Genotype-by-environment interaction (GEI) is a fundamental component in understanding complex tra... more Genotype-by-environment interaction (GEI) is a fundamental component in understanding complex trait variation. However, it remains challenging to identify genetic variants with GEI effects in humans largely because of the small effect sizes and the difficulty of monitoring environmental fluctuations. Here, we demonstrate that GEI can be inferred from genetic variants associated with phenotypic variability in a large sample without the need of measuring environmental factors. We performed a genome-wide variance quantitative trait locus (vQTL) analysis of ~5.6 million variants on 348,501 unrelated individuals of European ancestry for 13 quantitative traits in the UK Biobank, and identified 75 significant vQTLs with P<2.0×10-9 for 9 traits, especially for those related to obesity. Direct GEI analysis with five environmental factors showed that the vQTLs were strongly enriched with GEI effects. Our results indicate pervasive GEI effects for obesity-related traits and demonstrate the ...
Observational epidemiological studies have found an association between schizophrenia and breast ... more Observational epidemiological studies have found an association between schizophrenia and breast cancer, but it is not known if the relationship is a causal one. We used summary statistics from very large genome-wide association studies of schizophrenia (n = 40675 cases and 64643 controls) and breast cancer (n = 122977 cases and 105974 controls) to investigate whether there is evidence that the association is partly due to shared genetic risk factors and whether there is evidence of a causal relationship. Using LD-score regression, we found that there is a small but significant genetic correlation (rG) between the 2 disorders (rG = 0.14, SE = 0.03, P = 4.75 × 10–8), indicating shared genetic risk factors. Using 142 genetic variants associated with schizophrenia as instrumental variables that are a proxy for having schizophrenia, we estimated a causal effect of schizophrenia on breast cancer on the observed scale as bxy = 0.032 (SE = 0.009, P = 2.3 × 10–4). A 1 SD increase in liabili...
Using latent variables in gene expression data can help correct spurious correlations due to unob... more Using latent variables in gene expression data can help correct spurious correlations due to unobserved confounders and increase statistical power for expression Quantitative Trait Loci (eQTL) detection. Probabilistic Estimation of Expression Residuals (PEER) is a widely used statistical method that has been developed to remove unwanted variation and improve eQTL discovery power in bulk RNA-seq analysis. However, its performance has not been largely evaluated in single-cell eQTL data analysis, where it is becoming a commonly used technique. Potential challenges arise due to the structure of single-cell data, including sparsity, skewness, and mean-variance relationship. Here, we show by a series of analyses that this method requires additional quality control and data transformation steps on the pseudo-bulk matrix to obtain valid PEER factors. By using a population-scale single-cell cohort (OneK1K, N = 982), we found that generating PEER factors without further QC or transformation o...
GCTA (Genome-wide Complex Trait Analysis) is a software package, which was initially developed to... more GCTA (Genome-wide Complex Trait Analysis) is a software package, which was initially developed to estimate the proportion of phenotypic variance explained by all genome-wide SNPs for a complex trait but has been extensively extended for many other analyses of data from genome-wide association studies (GWASs). Please see the software website through the link below for more information. Software website: https://yanglab.westlake.edu.cn/software/gcta/
Genome-wide association studies (GWAS) have discovered numerous genetic variants associated with ... more Genome-wide association studies (GWAS) have discovered numerous genetic variants associated with human behavioural traits. However, behavioural traits are subject to misreports and longitudinal changes (MLC) which can cause biases in GWAS and follow-up analyses. Here, we demonstrate that individuals with higher disease burden in the UK Biobank (n = 455,607) are more likely to misreport or reduce their alcohol consumption (AC) levels, and propose a correction procedure to mitigate the MLC-induced biases. The AC GWAS signals removed by the MLC corrections are enriched in metabolic/cardiovascular traits. Almost all the previously reported negative estimates of genetic correlations between AC and common diseases become positive/non-significant after the MLC corrections. We also observe MLC biases for smoking and physical activities in the UK Biobank. Our findings provide a plausible explanation of the controversy about the effects of AC on health outcomes and a caution for future analys...
Vitamin D deficiency is a candidate risk factor for a range of adverse health outcomes. In a geno... more Vitamin D deficiency is a candidate risk factor for a range of adverse health outcomes. In a genome-wide association study of 25 hydroxyvitamin D (25OHD) concentration in 417,580 Europeans we identified 143 independent loci in 112 1-Mb regions providing new insights into the physiology of vitamin D and implicating genes involved in (a) lipid and lipoprotein metabolism, (b) dermal tissue properties, and (c) the sulphonation and glucuronidation of 25OHD. Mendelian randomization models found no robust evidence that 25OHD concentration had causal effects on candidate phenotypes (e.g. BMI, psychiatric disorders), but many phenotypes had (direct or indirect) causal effects on 25OHD concentration, clarifying the relationship between 25OHD status and health.
Understanding how natural selection has shaped the genetic architecture of complex traits and dis... more Understanding how natural selection has shaped the genetic architecture of complex traits and diseases is of importance in medical and evolutionary genetics. Bayesian methods have been developed using individual-level data to estimate multiple features of genetic architecture, including signatures of natural selection. Here, we present an enhanced method (SBayesS) that only requires GWAS summary statistics and incorporates functional genomic annotations. We analysed GWAS data with large sample sizes for 155 complex traits and detected pervasive signatures of negative selection with diverse estimates of SNP-based heritability and polygenicity. Projecting these estimates onto a map of genetic architecture obtained from evolutionary simulations revealed relatively strong natural selection on genetic variants associated with cardiorespiratory and cognitive traits and relatively small number of mutational targets for diseases. Averaging across traits, the joint distribution of SNP effect...
Background: Mendelian randomization (MR) is a method for exploring observational associations to ... more Background: Mendelian randomization (MR) is a method for exploring observational associations to find evidence of causality. Objective: To apply MR between multiple risk factors/phenotypic traits (exposures) and Parkinson disease (PD) in a large, unbiased manner, and to create a public resource for research. Methods: We used two-sample MR in which the summary statistics relating to SNPs from genome wide association studies (GWASes) of 5,839 exposures curated on MR-Base were used to assess causal relationships with PD. We selected the highest quality exposure GWASes for this report (n=401). For the disease outcome, summary statistics from the largest published PD GWAS were used. For each exposure, the causal effect on PD was assessed using the inverse variance weighted (IVW) method, followed by a range of sensitivity analyses. We used a false discovery rate (FDR) corrected p-value of <0.05 from the IVW analysis to prioritise traits of interest. Results: We observed evidence for ca...
We performed the largest genome-wide association study of PD to date, involving the analysis of 7... more We performed the largest genome-wide association study of PD to date, involving the analysis of 7.8M SNPs in 37.7K cases, 18.6K UK Biobank proxy-cases, and 1.4M controls. We identified 90 independent genome-wide significant signals across 78 loci, including 38 independent risk signals in 37 novel loci. These variants explained 26-36% of the heritable risk of PD. Tests of causality within a Mendelian randomization framework identified putatively causal genes for 70 risk signals. Tissue expression enrichment analysis suggested that signatures of PD loci were heavily brain-enriched, consistent with specific neuronal cell types being implicated from single cell expression data. We found significant genetic correlations with brain volumes, smoking status, and educational attainment. In sum, these data provide the most comprehensive understanding of the genetic architecture of PD to date by revealing many additional PD risk loci, providing a biological context for these risk factors, and ...
Understanding the difference in genetic regulation of gene expression between brain and blood is ... more Understanding the difference in genetic regulation of gene expression between brain and blood is important for discovering genes for brain-related traits and disorders. Here, we estimate the correlation of genetic effects at the top-associated cis-expression or -DNA methylation (DNAm) quantitative trait loci (cis-eQTLs or cis-mQTLs) between brain and blood (r ). Using publicly available data, we find that genetic effects at the top cis-eQTLs or mQTLs are highly correlated between independent brain and blood samples ([Formula: see text] for cis-eQTLs and [Formula: see text] for cis-mQTLs). Using meta-analyzed brain cis-eQTL/mQTL data (n = 526 to 1194), we identify 61 genes and 167 DNAm sites associated with four brain-related phenotypes, most of which are a subset of the discoveries (97 genes and 295 DNAm sites) using data from blood with larger sample sizes (n = 1980 to 14,115). Our results demonstrate the gain of power in gene discovery for brain-related phenotypes using blood cis-...
Genome-wide association studies (GWAS) have discovered numerous genetic variants associated with ... more Genome-wide association studies (GWAS) have discovered numerous genetic variants associated with human behavioural traits. However, behavioural traits are subject to misreports and longitudinal changes (MLC) which can cause biases in GWAS and follow-up analyses. Here, we demonstrate that individuals with higher disease burden in the UK Biobank (n=455,607) are more likely to misreport or reduce their alcohol consumption (AC) levels, and propose a correction procedure to mitigate the MLC-induced biases. The AC GWAS signals removed by the MLC corrections are enriched in metabolic/cardiovascular traits. Almost all the previously reported negative estimates of genetic correlations between AC and common diseases become positive/non-significant after the MLC corrections. We also observe MLC biases for smoking and physical activities in the UK Biobank. Our findings provide a plausible explanation of the controversy about the effects of AC on health outcomes and a caution for future analyses of self-reported behavioural traits in biobank data.
We conducted a meta-analysis of genome-wide association studies (GWAS) with ~16 million genotyped... more We conducted a meta-analysis of genome-wide association studies (GWAS) with ~16 million genotyped/imputed genetic variants in 62,892 type 2 diabetes (T2D) cases and 596,424 controls of European ancestry. We identified 139 common and 4 rare (minor allele frequency < 0.01) variants associated with T2D, 42 of which (39 common and 3 rare variants) were independent of the known variants. Integration of the gene expression data from blood (n = 14,115 and 2,765) and other T2D-relevant tissues (í µí± = up to 385) with the GWAS results identified 33 putative functional genes for T2D, three of which were targeted by approved drugs. A further integration of DNA methylation (í µí± = 1,980) and epigenomic annotations data highlighted three putative T2D genes (CAMK1D, TP53INP1 and ATP5G1) with plausible regulatory mechanisms whereby a genetic variant exerts an effect on T2D through epigenetic regulation of gene expression. We further found evidence that the T2D-associated loci have been under purifying selection.
Uploads
Papers by Angli Xue