Abstract
Estimation of heritability is an important task in genetics. The use of linear mixed models (LMMs) to determine narrow-sense SNP-heritability and related quantities has received much recent attention, due of its ability to account for variants with small effect sizes. Typically, heritability estimation under LMMs uses the restricted maximum likelihood (REML) approach. The common way to report the uncertainty in REML estimation uses standard errors (SE), which rely on asymptotic properties. However, these assumptions are often violated because of the bounded parameter space, statistical dependencies, and limited sample size, leading to biased estimates and inflated or deflated confidence intervals. In addition, for larger datasets (e.g., tens of thousands of individuals), the construction of SEs itself may require considerable time, as it requires expensive matrix inversions and multiplications.
Here, we present FIESTA (Fast confidence IntErvals using STochastic Approximation), a method for constructing accurate confidence intervals (CIs). FIESTA is based on parametric bootstrap sampling, and therefore avoids unjustified assumptions on the distribution of the heritability estimator. FIESTA uses stochastic approximation techniques, which accelerate the construction of CIs by several orders of magnitude, compared to previous approaches as well as to the analytical approximation used by SEs. FIESTA builds accurate CIs rapidly, e.g., requiring only several seconds for datasets of tens of thousands of individuals, making FIESTA a very fast solution to the problem of building accurate CIs for heritability for all dataset sizes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Fisher, R.A.: The correlation between relatives on the supposition of mendelian inheritance. Trans. R. Soc. Edinb. 52, 399–433 (1918)
Silventoinen, K., Sammalisto, S., Perola, M., Boomsma, D.I., Cornes, B.K., Davis, C., Dunkel, L., De Lange, M., Harris, J.R., Hjelmborg, J.V., et al.: Heritability of adult body height: a comparative study of twin cohorts in eight countries. Twin Res. 6(05), 399–408 (2003)
Macgregor, S., Cornes, B.K., Martin, N.G., Visscher, P.M.: Bias, precision and heritability of self-reported and clinically measured height in Australian twins. Hum. Genet. 120(4), 571–580 (2006)
Manolio, T.A., Brooks, L.D., Collins, F.S.: A hapmap harvest of insights into the genetics of common disease. J. Clin. Invest. 118(5), 1590 (2008)
Welter, D., MacArthur, J., Morales, J., Burdett, T., Hall, P., Junkins, H., Klemm, A., Flicek, P., Manolio, T., Hindorff, L., Parkinson, H.: The NHGRI GWAS catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42(Database issue), D1001–D1006 (2014)
Visscher, P.M., Hill, W.G., Wray, N.R.: Heritability in the genomics eraconcepts and misconceptions. Nat. Rev. Genet. 9(4), 255–266 (2008)
Kang, H.M., Zaitlen, N.A., Wade, C.M., Kirby, A., Heckerman, D., Daly, M.J., Eskin, E.: Efficient control of population structure in model organism association mapping. Genetics 178(3), 1709–1723 (2008)
Kang, H.M., Sul, J.H., Service, S.K., Zaitlen, N.A., Kong, S.Y.Y., Freimer, N.B., Sabatti, C., Eskin, E.: Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42(4), 348–354 (2010)
Lippert, C., Listgarten, J., Liu, Y., Kadie, C.M., Davidson, R.I., Heckerman, D.: Fast linear mixed models for genome-wide association studies. Nat. Methods 8(10), 833–835 (2011)
Zhou, X., Stephens, M.: Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44(7), 821–824 (2012)
Vattikuti, S., Guo, J., Chow, C.C.: Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits. PLoS Genet. 8(3), e1002637 (2012)
Wright, F.A., Sullivan, P.F., Brooks, A.I., Zou, F., Sun, W., Xia, K., Madar, V., Jansen, R., Chung, W., Zhou, Y.H., Abdellaoui, A., Batista, S., Butler, C., Chen, G., Chen, T.H., D’Ambrosio, D., Gallins, P., Ha, M.J., Hottenga, J.J., Huang, S., Kattenberg, M., Kochar, J., Middeldorp, C.M., Qu, A., Shabalin, A., Tischfield, J., Todd, L., Tzeng, J.Y., van Grootheest, G., Vink, J.M., Wang, Q., Wang, W., Wang, W., Willemsen, G., Smit, J.H., de Geus, E.J., Yin, Z., Penninx, B., Boomsma, D.I.: Heritability and genomics of gene expression in peripheral blood. Nat. Genet. 46(5), 430–437 (2014)
Kruijer, W., Boer, M.P., Malosetti, M., Flood, P.J., Engel, B., Kooke, R., Keurentjes, J.J., van Eeuwijk, F.A.: Marker-based estimation of heritability in immortal populations. Genetics 199(2), 379–398 (2015)
Yang, J., Benyamin, B., McEvoy, B.P., Gordon, S., Henders, A.K., Nyholt, D.R., Madden, P.A., Heath, A.C., Martin, N.G., Montgomery, G.W., Goddard, M.E., Visscher, P.M.: Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42(7), 565–569 (2010)
Yang, J., Lee, S.H., Goddard, M.E., Visscher, P.M.: GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88(1), 76–82 (2011)
Lohr, S.L., Divan, M.: Comparison of confidence intervals for variance components with unbalanced data. J. Stat. Comput. Simul. 58(1), 83–97 (1997)
Burch, B.D.: Comparing pivotal and REML-based confidence intervals for heritability. J. Agric. Biol. Environ. Stat. 12(4), 470–484 (2007)
Burch, B.D.: Assessing the performance of normal-based and REML-based confidence intervals for the intraclass correlation coefficient. Comput. Stat. Data Anal. 55(2), 1018–1028 (2011)
Kraemer, K.: Confidence intervals for variance components and functions of variance components in the random effects model under non-normality (2012)
Schweiger, R., Kaufman, S., Laaksonen, R., Kleber, M.E., März, W., Eskin, E., Rosset, S., Halperin, E.: Fast and accurate construction of confidence intervals for heritability. Am. J. Hum. Genet. 98(6), 1181–1192 (2016)
Chernoff, H.: On the distribution of the likelihood ratio. Ann. Math. Stat. 573–578 (1954)
Moran, P.A.: Maximum-likelihood estimation in non-standard conditions. In: Mathematical Proceedings of the Cambridge Philosophical Society, vol. 70, pp. 441–450. Cambridge University Press (1971)
Self, S.G., Liang, K.Y.: Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J. Am. Stat. Assoc. 82(398), 605–610 (1987)
Stern, S., Welsh, A.: Likelihood inference for small variance components. Can. J. Stat. 28(3), 517–532 (2000)
Visscher, P.M., Goddard, M.E.: A general unified framework to assess the sampling variance of heritability estimates using pedigree or marker-based relationships. Genetics 199(1), 223–232 (2015)
Thai, H.T., Mentré, F., Holford, N.H.G., Veyrat-Follet, C., Comets, E.: A comparison of bootstrap approaches for estimating uncertainty of parameters in linear mixed-effects models. Pharm. Stat. 12(3), 129–140 (2013)
Wolfinger, R.D., Kass, R.E.: Nonconjugate Bayesian analysis of variance component models. Biometrics 56(3), 768–774 (2000)
Chung, Y., Rabe-hesketh, S., Gelman, A., Dorie, V., Liu, J.: Avoiding boundary estimates in linear mixed models through weakly informative priors. Berkeley Preprints, pp. 1–3 (2011)
Harville, D.A., Fenech, A.P.: Confidence intervals for a variance ratio, or for heritability, in an unbalanced mixed linear model. Biometrics 137–152 (1985)
Burch, B.D., Iyer, H.K.: Exact confidence intervals for a variance ratio (or heritability) in a mixed linear model. Biometrics 1318–1333 (1997)
Furlotte, N.A., Heckerman, D., Lippert, C.: Quantifying the uncertainty in heritability. J. Hum. Genet. 59(5), 269–275 (2014)
Carpenter, J., Bithell, J.: Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Stat. Med. 19(9), 1141–1164 (2000)
Sudlow, C., Gallacher, J., Allen, N., Beral, V., Burton, P., Danesh, J., Downey, P., Elliott, P., Green, J., Landray, M., et al.: Uk biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12(3), e1001779 (2015)
Kushner, H., Yin, G.G.: Stochastic Approximation and Recursive Algorithms and Applications, vol. 35. Springer Science & Business Media, New York (2003)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 400–407 (1951)
Garthwaite, P.H.: Buckland, S.T.: Generating monte carlo confidence intervals by the robbins-monro process. Appl. Stat. 159–171 (1992)
Sabatti, C., Service, S.K., Hartikainen, A.L.L., Pouta, A., Ripatti, S., Brodsky, J., Jones, C.G., Zaitlen, N.A., Varilo, T., Kaakinen, M., Sovio, U., Ruokonen, A., Laitinen, J., Jakkula, E., Coin, L., Hoggart, C., Collins, A., Turunen, H., Gabriel, S., Elliot, P., McCarthy, M.I., Daly, M.J., Järvelin, M.R.R., Freimer, N.B., Peltonen, L.: Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nat. Genet. 41(1), 35–46 (2009)
Sawcer, S., Hellenthal, G., Pirinen, M., Spencer, C.C., Patsopoulos, N.A., Moutsianas, L., Dilthey, A., Su, Z., Freeman, C., Hunt, S.E., et al.: Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476(7359), 214 (2011)
Joseph, V.R.: Efficient Robbins-Monro procedure for binary data. Biometrika 91(2), 461–470 (2004)
Furlotte, N.A., Eskin, E.: Efficient multiple trait association and estimation of genetic correlation using the matrix-variate linear mixed-model. Genetics 200(1), 59–68 (2015)
Searle, S.R., Casella, G., McCulloch, C.E.: Variance Components, vol. 391. Wiley, Hoboken (2009)
Patterson, H.D., Thompson, R.: Recovery of inter-block information when block sizes are unequal. Biometrika 58(3), 545–554 (1971)
Yang, J., Zaitlen, N.A., Goddard, M.E., Visscher, P.M., Price, A.L.: Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46(2), 100–106 (2014)
Loh, P.R., Bhatia, G., Gusev, A., Finucane, H.K., Bulik-Sullivan, B.K., Pollack, S.J., de Candia, T.R., Lee, S.H., Wray, N.R., Kendler, K.S., O’Donovan, M.C., Neale, B.M., Patterson, N., Price, A.L.: Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47(12), 1385–1392 (2015)
Sidak, Z.: Rectangular confidence regions for the means of multivariate normal sistributions. J. Am. Stat. Assoc. 62(318), 626–633 (1967)
Wasserman, L.: All of Statistics: A Concise Course in Statistical Inference. Springer Science & Business Media, New York (2013)
Gilmour, A.R., Thompson, R., Cullis, B.R.: Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 1440–1450 (1995)
Acknowledgements
The authors would like to thank David Steinberg. R.S. is supported by the Colton Family Foundation. This study was supported in part by a fellowship from the Edmond J. Safra Center for Bioinformatics at Tel Aviv University to R.S. The Northern Finland Birth Cohort data were obtained from dbGaP: phs000276.v2.p1. This study makes use of data generated by the Wellcome Trust Case Control Consortium. A full list of the investigators who contributed to the generation of the data is available from www.wtccc.org.uk. Funding for the project was provided by the Wellcome Trust under award 076113.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
The supplementary material, including additional figures, are located at https://github.com/cozygene/albi.
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Schweiger, R., Fisher, E., Rahmani, E., Shenhav, L., Rosset, S., Halperin, E. (2017). Using Stochastic Approximation Techniques to Efficiently Construct Confidence Intervals for Heritability. In: Sahinalp, S. (eds) Research in Computational Molecular Biology. RECOMB 2017. Lecture Notes in Computer Science(), vol 10229. Springer, Cham. https://doi.org/10.1007/978-3-319-56970-3_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-56970-3_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56969-7
Online ISBN: 978-3-319-56970-3
eBook Packages: Computer ScienceComputer Science (R0)