Open preprint reviews by David Balding

Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits.

Luke Evans, Rasool Tahmasbi, Scott Vrieze, Goncalo Abecasis, Sayantan Das, Doug Bjelland, Teresa deCandia, Haplotype Reference Consortium, Mike Goddard, Benjamin Neale, Jian Yang, Peter Visscher, Matthew Keller

The authors find that GCTA performs best among the methods they compare, but this is because their simulation model is unrealistic in one important aspect, and it matches an unrealistic assumption in the heritability model assumed by GCTA. (The authors refer to GCTA as GREML, which is confusing as there are many GREML methods (including both GCTA and LDAK), which differ in the assumed heritability model.)

L71 "Genetic architecture refers to the number, frequencies, effect sizes, and locations of causal variants (CVs) underlying trait variation"

What is missing here is linkage disequilibrium (LD). The GCTA model assumes that the heritability of SNPs is distributed independently of LD, whereas LDAK assumes an inverse relationship between heritability and LD. Across 42 human GWAS we have shown that the LDAK model provides a much closer fit to reality (1), whereas the Evans et al simulations are based on the GCTA model and not the LDAK model.

We previously showed (2,3) that GCTA can give highly biased estimates in regions of low or high LD, whereas LDAK adjusts appropriately. Our recent work (1) was spurred by the misleading results presented in (4), which also compared the performance of GCTA and LDAK using phenotype simulations based on the GCTA model but not the LDAK model. We have pointed out this deficiency to some authors common to both (4) and Evans et al, so it is disappointing to see the same kind of unfair and unrealistic comparisons being made again.

L123 "... simulate phenotypes with differing genomic architectures under realistic patterns of LD structure"

While strictly correct, this is a potentially misleading claim because there was no attempt to include in the simulations a realistic relationship between LD and phenotype. It is true that the genotype data include realistic levels of LD, which is presumably what the authors intended to claim, but the LD-phenotype relationship might be understood to be covered by this claim, and is much more important yet has been ignored.

While we do not currently have a precise model for the relationship between LD and per-SNP heritability, in (1) we showed the LDAK model to be more realistic than the GCTA model across a wide-range of traits, and hence at a minimum any comparison of SNP heritability methods should consider results from simulations under models reflecting both the GCTA and the LDAK models.

David Balding, University of Melbourne and UCL

Doug Speed, UCL Genetics Institute, London

(1) Speed D, Cai N, UCLEB Consortium, Johnson M, Nejentsev S,
Balding D (2017) Re-evaluation of SNP heritability in complex human
traits. To appear Nat Genet, preprint: BioRχiv doi: 10.1101/074310

(2) Speed D, Hemani G, Johnson M, Balding D (2012) Improved
Heritability Estimation from Genome- Wide SNPs. Am J Human Genet, 91(6): 1011-1021. doi: 10.1016/j.ajhg.2012.10.010

(3) Speed D, Hemani G, Johnson M, Balding D (2013) Response to Lee et al: SNP-Based Heritability Analysis with Dense Data. Am J Human Genet, 93(6): 1155-57.

(4) Yang, J. et al. Genetic variance estimation with imputed
variants finds negligible missing heritability for human height and
body mass index. Nat. Genet. 47, 1114–1120 (2015).

show less