Discovering Candidate Genes Regulated by GWAS Signals in Cis and Trans
Contributing author: spal4@ncsu.edu)
Abstract
Understanding the genetic underpinnings of complex traits and diseases has been greatly advanced by genome-wide association studies (GWAS). However, a significant portion of trait heritability remains unexplained, known as “missing heritability”. Most GWAS loci reside in non-coding regions, posing challenges in understanding their functional impact. Integrating GWAS with functional genomic data, such as expression quantitative trait loci (eQTLs), can bridge this gap. This study introduces a novel approach to discover candidate genes regulated by GWAS signals in both cis and trans. Unlike existing eQTL studies that focus solely on cis-eQTLs or consider cis- and trans-QTLs separately, we utilize adaptive statistical metrics that can reflect both the strong, sparse effects of cis-eQTLs and the weak, dense effects of trans-eQTLs. Consequently, candidate genes regulated by the joint effects can be prioritized. We demonstrate the efficiency of our method through theoretical and numerical analyses and apply it to adipose eQTL data from the METabolic Syndrome in Men (METSIM) study, uncovering genes playing important roles in the regulatory networks influencing cardiometabolic traits. Our findings offer new insights into the genetic regulation of complex traits and present a practical framework for identifying key regulatory genes based on joint eQTL effects.
Keywords: Berk-Johns; Cardiometabolic traits; Core genes; eGene discovery; Higher criticism; Trans-eQTL
1 Introduction
A central goal of genetics is to understand how genetic variation, environment factors, and other sources of variations influence traits. Over the past decades, genome-wide association studies (GWASs), facilitated by high-throughput sequencing technologies, have revolutionized human genetics. However, GWAS have also left several outstanding questions: the identified GWAS loci have explained only a small percentage of trait heritability, known as the “missing heritability” problem; as sample sizes for GWAS continue to grow, the effect sizes of newly identified loci are mostly very weak; and the majority of GWAS loci are located in non-coding putatively regulatory regions of the genome, leaving it unclear which genes they regulate and how these genes function in pathways to impact the traits. These difficulties in interpreting GWAS results have hindered our efforts to understand disease etiologies and develop new therapies (Visscher et al., 2017; Cano-Gamez and Trynka, 2020).
To tackle these challenges, emerging studies integrate GWAS results with functional genomic data such as gene expression or chromatin activity profiles assayed across a range of cell types and tissues. Many large consortia, such as GTEx (Ardlie et al., 2015), have generated gene expression data for different tissues and cell lines together with genotype information. Such data have been extensively analyzed to study the regulatory effects of SNPs, i.e., expression quantitative trait loci (eQTLs), on gene expression across human tissues, and the results have proven informative for GWAS analysis. For example, transcriptome-wide association studies (TWAS) and colocalization analysis have been developed in recent years to identify genes causally involved in complex diseases based on GWAS and eQTL results. TWAS leverages information from eQTL catalogs to impute gene expression values and directly infer the association between traits and the imputed gene expression for the gene under investigation (Gusev et al., 2016; Gamazon et al., 2015). In contrast, colocalization analysis integrates association results from GWAS and eQTL mapping to identify instances where both traits share a causal variant. The rationale is that if a GWAS trait and a gene’s expression share the same associated SNP, it may suggest a regulatory (and thus putative causal) role of the SNP mediated through the gene on the GWAS trait (Cannon and Mohlke, 2018; Deng and Pan, 2020).
Integrative analyses provide important observations that for most complex traits, the most significant signals are located near genes that make functional sense, although those signals and genes contribute only a small fraction of the total heritability. In eQTL analysis, such nearby signals are called cis-acting eQTLs (cis-eQTL). Because each cis-eQTL influences only one phenotype (the local gene), they cannot elucidate the co-regulatory landscape of the transcriptome and thus the higher organization of gene regulation (Westra et al., 2013; Bryois et al., 2014; Smemo et al., 2014; Claussnitzer et al., 2015; Brynedal et al., 2017).
It has been observed that heritability is dominated by variation in a large number of non-coding common variants spread broadly across the genome and mediated through a wide range of gene functional categories. Such distant variants are called trans-acting eQTLs (trans-eQTLs) (Price et al., 2008; Zhu et al., 2016; Yao et al., 2017; Civelek et al., 2017). An emerging theory known as the omnigenic model suggests that complex traits or diseases are governed by possibly all genes expressed in relevant cells through network behaviors. Among these genes, there is a set of “core genes” directly involved in the biological pathways for each trait and a larger number of “peripheral genes” that can only affect the trait indirectly by connecting with these core genes via regulatory networks. The omnigenic model indicates that cis and trans effects of trait-associated variation are mediated by core genes, which are directly involved in the biological pathways for the trait. Moreover, it is discussed that for typical genes, the majority of expression heritability is caused by trans-eQTLs (Boyle et al., 2017; Liu et al., 2019).
However, for purely practical reasons, most eQTL mapping studies to date have concentrated on cis-eQTLs. This is because trans-eQTLs often have small regulatory effects in humans, making them notoriously difficult to identify. Analyses involving trans-eQTLs often require substantially large sample sizes, significant computational resources, and extremely stringent multiplicity adjustments.
In this paper, we propose a method to directly identify expression genes (eGenes) whose heritability is influenced by GWAS signals in both cis and trans, bypassing the need to identify weak trans-eQTLs individually. Our approach uses statistical metrics sensitive to both cis- and trans-acting signals – cis signals being relatively strong but sparse, and trans signals being relatively weak but dense. The eGenes prioritized by these metrics can offer valuable insights into the mechanisms underlying disease-related traits and may serve as candidates for core genes in omnigenic models.
The statistical metrics utilized in our method for eGene discovery have been studied in high-dimensional inference and discovered as being optimally adaptive for heterogeneous and heteroscedastic mixture model detection (Donoho and Jin, 2004; Jager and Wellner, 2007; Cai et al., 2011; Jeng et al., 2013, 2020). In the context of eQTL analysis, we investigate their adaptivity to both cis- and trans-acting signals, providing theoretical insights into their efficiency. Numerical analyses are conducted to highlight the advantages of our method in discovering three types of eGenes: those regulated exclusively by cis-eQTLs, those regulated solely by trans-eQTLs, and those influenced by both. We apply this new method to analyze adipose eQTL data from the METabolic Syndrome in Men (METSIM) study. The results provide valuable insights into the genes that play important roles in the regulatory networks for cardiometabolic traits.
2 Methodology
2.1 Formulation
Suppose there are SNPs, which are identified through GWAS to be associated with at least one of the disease-related traits under consideration. We consider all genes as potentially regulated by one or more of the GWAS signals. We obtain the summary statistics , , , from marginal association tests between the SNPs and the genes. For a given gene , assume that
where and represent the set of indices for noise variants and eQTL signals, respectively, which vary from gene to gene. Moreover, and represent the noise and signal distribution of , respectively, which also vary from gene to gene. In this model, only the null distribution is specified. The other components, , , and , are unknown.
Specifically, we expect that for a gene regulated solely by cis-eQTLs, its set should be small, and its signal distribution would concentrate more on strong intensities. On the other hand, for a gene regulated by multiple trans-eQTLs, its set should be relatively larger, and its signal distribution would concentrate more on weak intensities. For a gene influenced by both cis- and trans-eQTLs, its signal distribution could span both strong and weak intensities. Our goal is to discover all these different types of eGenes by prioritizing them over irrelevant genes and identify them with statistical guarantees.
2.2 Adaptive Metrics and Algorithm
A key component of our proposed eGene discovery method is the use of a statistical metric that has been identified as optimally adaptive for detecting various types of mixture models. This metric belongs to a general family of goodness-of-fit (GOF) tests, known for demonstrating asymptotic optimality in detecting weak and rare signals amidst a large number of noise variables (Donoho and Jin, 2004; Jager and Wellner, 2007; Donoho and Jin, 2008; Cai et al., 2011; Jeng et al., 2013, 2020). Two members in the GOF family, the Higher Criticism (HC) test and the Berk-Jones (BJ) test, have also been extensively studied in the literature. Specifically, the HC test is renowned for its effectiveness in identifying rare signals, while the BJ test is recognized for its robustness across different signal patterns in finite samples (Li and Siegmund, 2015; Barnett et al., 2017; Zhang et al., 2020; Sun and Lin, 2020).
The HC and BJ statistics are defined as follows: For a gene , derive the -values of ; order the -values increasingly so that ; and define
(1) |
and
(2) |
Both HC and BJ statistics were designed to detect departures from a specified null hypothesis that all the data points follow the null distribution U (Berk and Jones, 1979; Donoho and Jin, 2004). These statistics compare the ordered -values to their expected values under the null hypothesis and assess the maximum deviation of a weighted function of the differences between the observed and expected -values. The weights are chosen to give more importance to the regions where the differences are most significant. Larger values of the statistics indicate stronger evidence that the data come from a mixture of noise and signal distributions rather than just the noise distribution.
Next, we outline the detailed steps for the proposed eGene discovery method, which utilizes either HC or BJ statistic as the adaptive metric () to capture the joint regulatory effects of the GWAS signals in cis and trans. The genes are ranked based on this adaptive metric, and candidate eGenes are identified with statistical significance.
Step-by-Step Procedure:
-
1.
Identify the GWAS signals for a set of traits related to a specific disease.
-
2.
Obtain the -values for the marginal association tests between the GWAS signals and all the genes under consideration.
- 3.
-
4.
Rank the genes according to the values of ’s from largest to smallest.
-
5.
Calculate the -values of for all .
-
6.
Select candidate eGenes with significant -values through multiple testing.
2.3 Theoretical Insight
For a given gene , the problem of detecting the existence of any eQTL signals can be formulated as a global hypothesis testing problem. Let , representing the noise distribution after data standardization, and , representing the signal distribution, where both and are unknown. Then, an irrelevant gene that is not influenced by any of the SNPs corresponds to the global null hypothesis
(3) |
and a candidate gene regulated by one or more of the SNPs corresponds to the alternative hypothesis
(4) |
where represents the proportion of eQTL signals among all the SNPs, which varies from gene to gene.
To gain theoretical insight into the effect of eQTL signal proportion, as well as the impact of signal mean and variance on the global testing problem, we re-parameterize some model parameters as in Cai et al. (2011). This re-parameterization allows their effects to be represented on comparable scales and analyzed jointly. Specifically, let
(5) |
so that the signal sparsity is represented through the parameter in a constant scale with representing the extremely dense case where all SNPs are eQTL signals and representing the extremely sparse case where there is only one eQTL signal. Then, depending on the value of , we consider two scenarios: the sparse scenario with and the dense scenario with . In these two scenarios, the signal mean value is re-parameterized differently:
(6) |
where is at the constant scale and represents the signal mean effect. The above re-parameterization reflects two different types of signals, one being sparse () and relatively strong (), the other being dense () and relatively weak ().
After the re-parameterization, all the parameters , , and are at a constant scale, which facilitates the analysis of their joint effects. Notably, the detection boundary for the global testing problem has been found in Cai et al. (2011) and Jeng et al. (2013), assuming independence among . Its specific form varies across different scenarios as follows:
When signals are relatively sparse with ,
(7) |
When signals are relatively dense with ,
(8) |
The detection boundary results can serve as a benchmark for method evaluation as it implies the following three aspects:
-
1.
When the signal mean value does not reach the detection boundary, i.e.,
(9) no methods can consistently separate from , i.e.,
-
2.
If a method can consistently separate from with
whenever the signal mean value passes the detection boundary, i.e.,
(10) then this method is considered an optimal method.
-
3.
If a method can achieve optimality and its implementation does not require specified values of the model parameters, then the method is considered an optimally adaptive method.
In the literature, it has been proved that all the members in the aforementioned GOF family are optimally adaptive for the global testing problem in the sparse scenario () with equal variance ()(Jager and Wellner, 2007). Moreover, the optimal adaptivity of HC test has also been proved in the extended scenarios with () and possibly unequal variance () (Cai et al., 2011; Jeng et al., 2013). We expect that the BJ test performs comparatively to the HC test in the extended scenarios, and this conjecture is supported in the following numerical studies.
3 Simulation
3.1 Simulation Setup
The simulation study aims at checking the competence and efficiency of the proposed eGene discovery method.
In the study, we consider genes and SNPs.
We design three types of eGenes as depicted in Figure 1:
Cis-eGene: illustrated by Gene 1, which is only regulated by one cis-eQTL (SNP 1).
Mixed-eGene: illustrated by Gene 2, which is co-regulated by one cis-eQTL (SNP 2) and a set of trans-eQTLs (a random subset from SNPs 3 - 1002).
Trans-eGene: illustrated by Genes 3 - 20, each of which are regulated by a set of trans-eQTLs (random subsets from SNPs 3 - 1002).
The remaining genes (Genes 21 - 100) are not associated with any of the SNPs and are not depicted in Figure 1.
We generate the summary statistics for the association tests between the 100 genes and SNPs as follows: For , let
For (Gene 1), and , representing the effect of one strong cis-eQTL. For (Gene 2), and , representing the mixed effect of one strong cis-eQTL and many weak trans-eQTLs. For (Genes 3 - 20), and , representing the effect of many weak trans-eQTLs. Note that are randomly selected subsets of SNPs from SNPs 3 - 1002 with a common cardinality . For (Genes 21 - 100), , so that they are irrelevant genes. More specific values of the model parameters in the simulation examples include , or 0.2, or 1.5, and or 50. Note that represents the case where the weak trans-eQTLs barely show a non-zero mean effect, and their effects are solely demonstrated through increased variability ().
3.2 Methods Comparison
We apply the proposed eGene discovery procedure, as described in Section 2.2, to the simulated data. Here, we focus on evaluating the efficiency of the proposed method in prioritizing true eGenes over irrelevant ones. We compare both the HC version and the BJ version of our procedure with other existing methods that use different information pooling strategies to summarize the eQTL effects received by a gene. These methods are summarized as follows:
-
(a)
Mean-based method: For a given gene k, summarize the eQTL effects it receives from all the SNPs through the mean value .
-
(b)
MinimumP-based method: For a given gene k, summarize the eQTL effects through the minimum -value of : .
-
(c)
HC-based method: Use the HC statistic in (1) to summarize the eQTL effects for each gene.
-
(d)
BJ-based method: Use the BJ Statistic in (2) to summarize the eQTL effects for each gene.
Both the Mean- and MinimumP-based methods are commonly used approaches for information pooling. We compare the finite-sample performances of these methods in simulation settings designed to reflect different types of eGenes. The comparisons focus on the ability of these methods to prioritize different types of eGenes (Genes 1–20) over the irrelevant genes (Genes 21–100).
The results are presented using the Precision-Recall (PR) curve. The PR curve is particularly effective for evaluating the performance of a method in prioritizing relevant variables over irrelevant variables or, in other words, the ranking efficiency of a method. Here, the four methods in (a) - (d) rank the genes differently according to different information pooling approaches. To evaluate the ranking efficiency of these methods, a series of two metrics: Precision and Recall, are calculated along the ranked genes for each method. That is, for a list of ranked genes by a method, calculate
for all . As the rank r increasing from 1 to 100, Recallr increases to reach 1 eventually while Precisionr generally decreases. The PR curve is plotted with Recallr on the x-axis and Precisionr on the y-axis, and the area under the PR curve (AUC-PR) is a single scalar value that summarizes the overall performance of the method in prioritizing true eGenes over irrelevant genes. Compared to the commonly used ROC curve, PR curve is more appropriate when dealing with imbalanced datasets where the number of signals is expected to be much less than that of the noise variables.
3.3 Simulation Results
Figure 2 shows that in the settings where different types of eGenes co-exist and the trans-eGenes are the majority, the MinimumP method performs the worst among the four methods in the left panel, where trans-eQTL signals are relatively weak and dense (). Conversely, in the right panel where trans-eQTL signals are stronger and sparser (), the Mean-based method performs the worst. In both scenarios, the BJ and HC methods exhibit more stable results and outperform the other two methods, with BJ and HC performing comparably.
Figure 2 presents results with a zero mean value for trans-eQTLs (). Similar results are observed with and are thus omitted to save space.
4 Real Data Analysis
We applied the proposed eGene discovery method to adipose eQTL data from the METSIM study (Laakso et al., 2017; Raulerson et al., 2019). This dataset consisted of RNA-seq data generated on subcutaneous adipose tissue biopsies from 426 males from Kuopio, Finland. While Raulerson et al. (2019) primarily investigated adipose cis-eQTL ( 1Mb) colocalized with GWAS signals, we aimed here to discover eGenes regulated by GWAS variants in both cis and trans ( 1Mb). The RNA-seq data quality control, normalization, and covariates were described in Raulerson et al. (2019). Trans-eQTL analyses excluded pseudogenes and genes with low mappability, and we used 3 PEER factors as covariates for both cis and trans analyses. We obtained the -values for the marginal association tests between the 2,879 GWAS signals (minor allele frequency and imputation ) and 24,371 genes described in that study. We found that a small number of genes with only one insignificant eQTL data point can cause errors when deriving their BJ p-values using the SetTest package. Consequently, we excluded these genes and conducted our analysis with the remaining 21,593 genes.
As detailed in Section 2.2, we calculated the BJ statistics and ranked the genes accordingly. We further derived the BJ -values of all the 21593 genes and applied multiple testing (Bonferroni ) to select statistically significant eGenes. This resulted in the selection of 424 genes. The complete list of selected genes, ranked by their BJ statistics, is provided in the Appendix.
Figure 3 presents the histogram of the BJ -values of the 424 selected genes in the first panel. It can been seen that most of the selected genes have their BJ p-values much lower than the Bonferroni threshold (4.103237e-07). The results are supported by the outcomes of the HC method as shown in the second panel of Figure 3, where the HC -values of the selected 424 genes are also generally very small.
We further compared our results with the eGenes identified in Raulerson et al. (2019), where 318 eGenes (287 primary and 31 secondary) are identified using only the cis-eQTL data. We found 134 common genes between the two sets of results. As illustrated in the third panel of Figure 3, the matching proportion along our list of ranked selected genes generally decreases from 0.67 to 0.32, indicating that our method effectively prioritize true eGenes over irrelevant ones.
It is noteworthy that, different from Raulerson et al. (2019), our study simultaneously analyzes both cis and trans-eQTL data and captures the joint cis- and trans-acting effects on the genes. This approach could enhance the discovery of eGenes that play significant roles in encoding multifunctional proteins or regulating various cellular processes. The top 5 genes, as presented in Table 1, illustrate such examples.
Rank | Gene Name | Function Interpretation |
---|---|---|
1 | C1QTNF4 | Serum C1q/TNF-Related Protein 4 levels have been shown to be |
lower in type 2 diabetes patients with nonalcoholic fatty liver | ||
disease (NAFLD) than those without NAFLD. | ||
2 | WDR6 | Upregulation of WD40 repeat-containing protein 6 levels |
has been reported to drive hepatic de novo lipogenesis | ||
during insulin resistance. | ||
3 | AC034243.1 | This long noncoding RNA is transcribed in the antisense direction |
from the transcription start site of CTNNA1, which encodes | ||
catenin alpha 1 and has been implicated in cancer. | ||
4 | HLA-DQA1 | This component of the major histocompatibility complex plays a |
keyrole in the immune system. | ||
5 | PEX6 | Peroxisomal biogenesis factor 6 plays a role in peroxisomal |
protein import. |
5 Conclusion and Discussion
In this paper, we present a new approach to discovering candidate eGenes by detecting the joint effects of cis- and trans-eQTLs. Leveraging both cis and trans-eQTL data, our method offers a more comprehensive understanding of gene regulation mechanisms compared to traditional methods that consider these effects separately. The novel application of the statistically adaptive metrics enhances the prioritization of key regulatory genes involved in complex traits and diseases.
Applying our approach to adipose eQTL data from the METSIM study, we discovered eGenes regulated by GWAS signals in both cis and trans across a broad spectrum of 93 cardiometabolic traits. This analysis identified 424 significant eGenes, exceeding the number of genes identified by methods that focused solely on cis-eQTLs. The top-ranked eGenes selected by our method play significant roles in encoding multifunctional proteins and regulating various cellular processes. These findings underscore the potential of our eGene discovery method in uncovering complex genetic regulatory mechanisms. The identified eGenes provide valuable insights into the genetic architecture of complex traits and offer promising targets for further functional studies and therapeutic interventions.
Acknowledgments. The authors would like to thank Dr. Sarah Brotman and Dr. Karen Mohlke for their assistance with accessing eQTL data from the METSIM study. The authors also express their gratitude to Dr. Zheyang Wu and Dr. Hong Zhang for their support with the SetTest software.
Declarations.
Data availability The adipose eQTL data used in this paper is sourced from Raulerson et al. (2019) and is publicly accessible.
Appendix A: Complete list of selected genes
[1] C1QTNF4 WDR6 AC034243.1 HLA-DQA1 PEX6 [6] UHRF1BP1 ERAP2 HLA-C HSD17B12 RBM6 [11] CDK2AP1 HLA-DQB1 LINC00680 WNT6 CERS4 [16] CDC37P1 AC016683.6 AP006621.1 PAX8 JMJD7 [21] DALRD3 HLA-DQB1-AS1 XXbac-BPG254F23.6 MLXIPL PM20D1 [26] SNX10 CTD-2651B20.3 XXbac-BPGBPG55C20.1 RP11-282O18.3 NBPF3 [31] NCKIPSD RHD XKR9 ADORA2B HLA-DRB1 [36] NPC1L1 MIR1307 AP006621.5 XXbac-BPG299F13.17 ITGB6 [41] AS3MT GNL3 SPTBN5 MAN2C1 DISP2 [46] C2orf74 C10orf32-ASMT LITAF RP11-217B7.2 RP11-378A13.1 [51] NPC1 KIAA1841 KIAA1875 FN3KRP GBAP1 [56] RP11-65I12.1 JMJD7-PLA2G4B XXbac-BPG248L24.12 RP11-73M18.9 ADH1A [61] ZSWIM7 C15orf38-AP3S2 ANAPC4 RP11-296I10.6 MED24 [66] NPIPB7 RHCE AC034220.3 RP11-109L13.1 ZNF600 [71] CEP192 LILRA3 SERBP1P3 C12orf65 AC068039.4 [76] RP11-1348G14.4 GNMT RPL36P4 TIPARP GTF2H1 [81] SMIM19 AC145343.2 AOC1 EXOSC6 RP11-755F10.1 [86] DGKQ MAST4-AS1 RP11-624M8.1 STAG3L1 LILRB2 [91] RFT1 AC018720.10 PLA2G4B GSDMA EIF3KP1 [96] NR1H3 NT5DC2 NPIPA5 INO80E PPID [101] RP11-57H14.2 SH2B1 MEGF9 ERAP1 NDUFAF1 [106] CDK5RAP3 RP11-673E1.1 AP3S2 EIF3C PIDD [111] AMT PLCE1 CDH13 PSMD5-AS1 MAP1LC3A [116] TRIP4 RP11-179B2.2 HLA-DRB6 SLC47A1 RP11-252K23.2 [121] STRA13 CSNK2B XXbac-BPGBPG55C20.2 ITIH4 RP11-529H20.6 [126] SLC5A11 VEGFB KIAA1683 ATXN2L RPS26 [131] C10orf32 AL049840.1 CTD-2303H24.2 RP4-712E4.1 RP11-244F12.3 [136] USP32P2 MYEOV AP006621.6 CTC-228N24.3 CLEC18A [141] RP11-69E11.4 TMEM180 SLC28A2 CRYZ CEACAMP3 [146] CCDC12 CHST8 NPIPA3 NMRAL1 RP11-148O21.6 [151] APOB NPIPB6 RP11-211G23.2 RP1-140A9.1 EMC1 [156] MST1L CCDC116 SRR PEPD C15orf57 [161] BMP8A ARHGAP1 MAP3K11 SH3PXD2B G6PC2 [166] SCAPER SPPL3 RP11-397E7.4 SULT1A2 RP11-17E13.2 [171] TCP11 HLA-B BTN3A2 KLF14 XXbac-BPG300A18.13 [176] AP3B2 LINC00310 TRIM73 CLUAP1 RP11-419C5.2 [181] HSD17B13 KLHL31 RP1-313I6.12 OPRL1 SLC35G5 [186] LRRC37A SDHDP6 POM121C RP11-378J18.8 SYPL2 [191] HEY2 RNF5 KAT8 TMBIM1 TCEA3 [196] HERPUD1 HLA-DRA PRMT6 RP13-753N3.1 CCDC163P [201] AFF1 HMGN1 LACTB PACS1 GUSBP4 [206] RP11-64K12.4 LINC00202-1 CCDC36 SERPINC1 ARL14EP [211] NPIPA1 AMIGO1 LGALS9C ROM1 MMP16 [216] TIGD7 CNTN2 RPAP1 PDLIM4 PEMT [221] CWF19L1 INVS CELSR2 MARVELD3 RP11-107F6.3 [226] EYA1 TRMT61B RP11-680G24.5 RP11-324E6.9 SUZ12P [231] PKD1 WARS2 LHCGR AHSA2 LRRC37A2 [236] RP5-874C20.3 AC138783.12 ANK1 SLC7A9 CCHCR1 [241] TMEM60 ZNF589 LMOD1 MICA RP11-502I4.3 [246] NUDCD3 RNF157 ADAM1A ADCY10P1 VPS53 [251] TRIM66 MRPS18AP1 MYBPC3 MARK3 KIAA1161 [256] GYPE CTD-2260A17.3 ALMS1P RPP25 RP11-680G24.4 [261] TOM1L2 PLEC NBPF1 TMEM9B-AS1 CPNE1 [266] HLA-DQA2 NFKBIL1 RP3-465N24.5 IL20RB UGT1A7 [271] AL022393.7 ZSCAN31 UGT1A9 UGT1A8 UGT1A3 [276] UGT1A5 RP11-114H24.5 HCAR2 ZC3H12C IRS1 [281] ELP6 RAN C18orf8 SLC9B1 LONRF1 [286] FAM106A OXCT2 SMG5 RP11-493E12.2 TTC12 [291] PHC1 NEURL2 UGT1A1 TRIM24 RMDN1 [296] CEACAM21 TLK1 UGT1A10 PDXDC2P GS1-259H13.2 [301] LRRC36 SFI1 SPATA5L1 DHRS11 EIF3CL [306] OXCT2P1 TSPY26P MRPL45P2 RP5-966M1.6 LPIN3 [311] RP11-593F23.1 RP11-339B21.14 LINC00674 UGT1A6 REEP2 [316] UGT1A4 SIL1 LRRC45 ABLIM3 NDUFS1 [321] GPIHBP1 APOC1P1 PLTP RP11-1348G14.1 BEND6 [326] SAA3P SULT1A1 HLA-DRB5 SURF1 RAC1 [331] ADIPOQ-AS1 EFCAB13 FIGNL1 HCAR1 STAG3L2 [336] AC131056.3 KNOP1 CYP17A1 CTNNAL1 CTC-510F12.4 [341] NPPA-AS1 GPR180 TEX264 RP11-304L19.3 MT1E [346] MST1R RP11-817O13.8 MKX GOSR1 RPL8 [351] ANXA5 GALNT2 RP1-265C24.5 PARP6 TCF19 [356] SSC5D ENDOG RP11-196G11.2 PCNX AF131215.9 [361] GATSL3 PRPH2 FGF9 PTH1R ZNF577 [366] FAM182A TTYH1 KHK RP11-457M11.5 RMI2 [371] SPC25 AC007390.5 RP1-130H16.18 CAMK2G PDE3A [376] TMEM116 TNXA RP11-573D15.1 CTA-85E5.10 AC138430.4 [381] DNAH17 RP11-408A13.4 C4B OLFM2 NUP85 [386] PTPN12 CYP17A1-AS1 B3GALNT2 RP1-18D14.7 RSPO3 [391] RP11-24N18.1 LRRC37A6P GATM NAP1L4 RP11-7F17.5 [396] RP11-433P17.3 DAP3 CYBRD1 TMEM110 PABPC4 [401] RP5-968P14.2 C1orf167 GFPT1 RP11-345M22.2 AC010883.5 [406] AC012146.7 RP11-10L12.4 CROCCP2 RP11-111A22.1 LINC00886 [411] AC006129.4 GRK4 TBKBP1 CMIP ZNF781 [416] MST1 CCL3L3 YPEL3 HAPLN4 SKIV2L [421] ZNF703 RFC4 PEG3 UFSP1
References
- Ardlie et al. (2015) Ardlie, K. G., D. S. Deluca, A. V. Segrè, T. J. Sullivan, T. R. Young, E. T. Gelfand, C. A. Trowbridge, J. B. Maller, T. Tukiainen, et al. (2015). The genotype-tissue expression (gtex) pilot analysis: Multitissue gene regulation in humans. Science 348(6235), 648–660. PMCID: PMC4547484.
- Barnett et al. (2017) Barnett, I., R. Mukherjee, and X. Lin (2017). The generalized higher criticism for testing snp-set effects in genetic association studies. Journal of the American Statistical Association 112(517), 64–76.
- Berk and Jones (1979) Berk, R. H. and D. H. Jones (1979). Goodness-of-fit test statistics that dominate the kolmogorov statistics. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 47(1), 47–59.
- Boyle et al. (2017) Boyle, E. A., Y. I. Li, and J. K. Pritchard (2017). An expanded view of complex traits: from polygenic to omnigenic. Cell 169(7), 1177–1186.
- Brynedal et al. (2017) Brynedal, B., J. Choi, T. Raj, R. Bjornson, B. E. Stranger, B. M. Neale, B. F. Voight, and C. Cotsapas (2017). Large-scale trans-eqtls affect hundreds of transcripts and mediate patterns of transcriptional co-regulation. The American Journal of Human Genetics 100(4), 581–591.
- Bryois et al. (2014) Bryois, J., A. Buil, D. M. Evans, J. P. Kemp, S. B. Montgomery, D. F. Conrad, K. M. Ho, S. Ring, M. Hurles, P. Deloukas, et al. (2014). Cis and trans effects of human genomic variants on gene expression. PLoS genetics 10(7), e1004461.
- Cai et al. (2011) Cai, T. T., X. J. Jeng, and J. Jin (2011). Optimal detection of heterogeneous and heteroscedastic mixtures. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73(5), 629–662.
- Cannon and Mohlke (2018) Cannon, M. E. and K. L. Mohlke (2018). Deciphering the emerging complexities of molecular mechanisms at gwas loci. The American Journal of Human Genetics 103(5), 637–653.
- Cano-Gamez and Trynka (2020) Cano-Gamez, E. and G. Trynka (2020). From gwas to function: using functional genomics to identify the mechanisms underlying complex diseases. Frontiers in genetics 11, 424.
- Civelek et al. (2017) Civelek, M., Y. Wu, C. Pan, C. K. Raulerson, A. Ko, A. He, C. Tilford, N. K. Saleem, A. Stančáková, L. J. Scott, et al. (2017). Genetic regulation of adipose gene expression and cardio-metabolic traits. The American Journal of Human Genetics 100(3), 428–443.
- Claussnitzer et al. (2015) Claussnitzer, M., S. N. Dankel, K.-H. Kim, G. Quon, W. Meuleman, C. Haugen, V. Glunk, I. S. Sousa, J. L. Beaudry, V. Puviindran, et al. (2015). Fto obesity variant circuitry and adipocyte browning in humans. New England Journal of Medicine 373(10), 895–907.
- Deng and Pan (2020) Deng, Y. and W. Pan (2020). A powerful and versatile colocalization test. PLoS computational biology 16(4), e1007778.
- Donoho and Jin (2004) Donoho, D. and J. Jin (2004). Higher criticism for detecting sparse heterogeneous mixtures. Annals of Statistics 32(3), 962–994.
- Donoho and Jin (2008) Donoho, D. and J. Jin (2008). Higher criticism thresholding: Optimal feature selection when useful features are rare and weak. Proceedings of the National Academy of Sciences 105(39), 14790–14795.
- Gamazon et al. (2015) Gamazon, E. R., H. E. Wheeler, K. P. Shah, S. V. Mozaffari, K. Aquino-Michaels, R. J. Carroll, A. E. Eyler, J. C. Denny, G. Consortium, D. L. Nicolae, et al. (2015). A gene-based association method for mapping traits using reference transcriptome data. Nature genetics 47(9), 1091–1098.
- Gusev et al. (2016) Gusev, A., A. Ko, H. Shi, G. Bhatia, W. Chung, B. W. Penninx, R. Jansen, E. J. De Geus, D. I. Boomsma, F. A. Wright, et al. (2016). Integrative approaches for large-scale transcriptome-wide association studies. Nature genetics 48(3), 245–252.
- Han et al. (2023) Han, J., H. Fan, Y. Dai, and X. Cheng (2023). Serum c1q/tnf-related protein 4 levels are associated with nonalcoholic fatty liver disease in type 2 diabetic patients. Metabolic Syndrome and Related Disorders 21(3), 163–168.
- Jager and Wellner (2007) Jager, L. and J. A. Wellner (2007). Goodness-of-fit tests via phi-divergences.
- Jeng et al. (2013) Jeng, X. J., T. T. Cai, and H. Li (2013). Simultaneous discovery of rare and common segment variants. Biometrika 100(1), 157–172. PMCID:PMC3696347.
- Jeng et al. (2020) Jeng, X. J., J. Rhyne, T. Zhang, and J.-Y. Tzeng (2020). Effective snp ranking improves the performance of eqtl mapping. Genetic epidemiology 44(6), 611–619.
- Laakso et al. (2017) Laakso, M., J. Kuusisto, A. Stančáková, T. Kuulasmaa, P. Pajukanta, A. J. Lusis, F. S. Collins, K. L. Mohlke, and M. Boehnke (2017). The metabolic syndrome in men study: a resource for studies of metabolic and cardiovascular diseases. Journal of lipid research 58(3), 481–493. PMID:28119442.
- Li and Siegmund (2015) Li, J. and D. Siegmund (2015). Higher criticism: p-values and criticism. Annals of Statistics 43(3), 1323–1350.
- Liu et al. (2019) Liu, X., Y. I. Li, and J. K. Pritchard (2019). Trans effects on gene expression can drive omnigenic inheritance. Cell 177(4), 1022–1034.
- Lobo et al. (2021) Lobo, S., P. R. Benusiglio, F. Coulet, L. Boussemart, L. Golmard, I. Spier, R. Hüneburg, S. Aretz, C. Colas, and C. Oliveira (2021). Cancer predisposition and germline ctnna1 variants. European journal of medical genetics 64(10), 104316.
- Price et al. (2008) Price, A. L., N. Patterson, D. C. Hancks, S. Myers, D. Reich, V. G. Cheung, and R. S. Spielman (2008). Effects of cis and trans genetic ancestry on gene expression in african americans. PLoS genetics 4(12), e1000294.
- Raulerson et al. (2019) Raulerson, C. K., A. Ko, J. C. Kidd, K. W. Currin, S. M. Brotman, M. E. Cannon, Y. Wu, C. N. Spracklen, A. U. Jackson, H. M. Stringham, et al. (2019). Adipose tissue gene expression associations reveal hundreds of candidate genes for cardiometabolic traits. The American Journal of Human Genetics 105(4), 773–787.
- Smemo et al. (2014) Smemo, S., J. J. Tena, K.-H. Kim, E. R. Gamazon, N. J. Sakabe, C. Gómez-Marín, I. Aneas, F. L. Credidio, D. R. Sobreira, N. F. Wasserman, et al. (2014). Obesity-associated variants within fto form long-range functional connections with irx3. Nature 507(7492), 371–375. PMCID:PMC4113484.
- Sun and Lin (2020) Sun, R. and X. Lin (2020). Genetic variant set-based tests using the generalized berk–jones statistic with application to a genome-wide association study of breast cancer. Journal of the American Statistical Association 115(531), 1079–1091.
- Visscher et al. (2017) Visscher, P. M., N. R. Wray, Q. Zhang, P. Sklar, M. I. McCarthy, M. A. Brown, and J. Yang (2017). 10 years of gwas discovery: biology, function, and translation. The American Journal of Human Genetics 101(1), 5–22. PMCID: PMC5501872.
- Westra et al. (2013) Westra, H.-J., M. J. Peters, T. Esko, H. Yaghootkar, C. Schurmann, J. Kettunen, M. W. Christiansen, B. P. Fairfax, K. Schramm, J. E. Powell, et al. (2013). Systematic identification of trans eqtls as putative drivers of known disease associations. Nature genetics 45(10), 1238–1243.
- Yao et al. (2017) Yao, C., R. Joehanes, A. D. Johnson, T. Huan, C. Liu, J. E. Freedman, P. J. Munson, D. E. Hill, M. Vidal, and D. Levy (2017). Dynamic role of trans regulation of gene expression in relation to complex traits. The American Journal of Human Genetics 100(4), 571–580. PMCID:PMC5384035.
- Yao et al. (2023) Yao, Z., Y. Gong, W. Chen, S. Shao, Y. Song, H. Guo, Q. Li, S. Liu, X. Wang, Z. Zhang, et al. (2023). Upregulation of wdr6 drives hepatic de novo lipogenesis in insulin resistance in mice. Nature metabolism 5(10), 1706–1725.
- Zhang et al. (2020) Zhang, H., J. Jin, and Z. Wu (2020). Distributions and power of optimal signal-detection statistics in finite case. IEEE Transactions on Signal Processing 68, 1021–1033.
- Zhu et al. (2016) Zhu, Z., F. Zhang, H. Hu, A. Bakshi, M. R. Robinson, J. E. Powell, G. W. Montgomery, M. E. Goddard, N. R. Wray, P. M. Visscher, et al. (2016). Integration of summary data from gwas and eqtl studies predicts complex trait gene targets. Nature genetics 48(5), 481. PMID:27019110.