Review for "Estimating heritability without environmental bias"

Completed on 28 Feb 2018

This review is referring to the preprint version that was posted on November 14, 2017 with doi:

This paper includes a novel method to directly estimate heritability after adjusting for environmental bias. By utilizing simulation and comparing with currently accepted methods, the authors have been able to prove the efficacy and unbiasedness of the relatedness disequilibrium regression (RDR) in the Icelandic population. Separating direct and indirect genetic effects to help remove more environmental bias is very interesting and has important implications. As large cohorts of families with genotypic data become more common, this could become a standard way of estimating heritability after some further validation.

Comments to author

Overall, the authors did an excellent job describing the theory behind their RDR methodology; this was quite apparent when looking at the supplementary notes. The abstract and introduction built a solid foundation to explore the topic at hand. In general, the flow of the article felt quite appropriate. There were however, some points that the authors may want to consider updating for easier understanding.

Major Comments:

1.      The definition that you are using of heritability is very specific but not necessarily the same definition that the other methods that you are comparing the RDR method to. In this case, you may not be comparing numbers that represent the same thing.

2.      Starts on page 14, last paragraph: When testing for ascertainment bias, you only looked at education level. You explained well why you looked at education level, but is there a reason why that is the only factor that you looked at to determine the presence of ascertainment bias?

3.      Page 16, first paragraph: Was a bias introduced to your estimates because you did not age restrict the sibling regression estimates. We understand that you did not want to lose too many samples, but if the numbers are not representative of the same thing, then your results may not be accurate. Is there a way to do some sort of sensitivity analysis?

Minor Comments:

1.      Page 15, last paragraph and page 18, last paragraph: In the Methods section, you mention that you want your heritability estimates to be for adults and not the elderly, but you do not explain why this is. When talking about the twin studies, you mention that heritability estimates are lower in elderly populations. It would be clearer if you could explain this in the traits measurement section as it comes before the twin study section.

2.      Some of the equations in the calculation of relatedness matrices section are not as separated from the surrounding text as others in the section as well as in the simulations using deCODE data section. We think the paper is easier to read when there is more space between the equations and the surrounding text.

3.      The captions for the tables and figures (main paper and supplementary) are very long and tend to be repetitive of what you say in the methods section. It may be easier to read if you take out a lot of the background from the captions and leave the descriptions in the Methods section.

4.      The term genetic nurturing is a bit confusing due to the positive connotation associated with nurturing and the potential positive and negative effects of genetic nurturing. There are other terms to describe this in the literature. Is there a reason that you picked this one over others that may have fewer connotations associated with it?

5.      It seems like the heritability estimates is one of your main results, but the table showing this is in the supplementary materials. Is there a reason why you put this in the supplementary section instead of the main paper?

6.     Results, 2nd paragraph: Although no gene-by-environment interaction and no parent-of-origin effects (supplementary note) are common assumptions and simplifications, they are probably not true for many human traits. For example, gene-by-environment interaction can play a role in educational attainment. It could be better to explore or discuss the effect of such violation of the assumption.

7.     It is not very clear to us what are the differences and connections between indirect genetic effect of the untransmitted allele and the environment to which the offspring are exposed “due to” that untransmitted allele. Is the latter an intermediate step of the former’s effect on the offspring? If so, then do we expect to see no indirect genetic effect after controlling for all environments? It would be clearer if you could explain this indirect genetic effect a bit more.

8.    Page 4, last paragraph: The author demonstrated that relative pairs where one is the direct ancestor of the other comprise a small fraction of the total pairs in the Icelandic sample, and their effects is negligible. It is suggested for the future application of RDR that the authors give a rough percentage of those pairs in the total sample that below that number, there is no need to remove such pairs. Or the author could also give some instructions that under what scenario the effect of parent-offspring and grandparent-grandchild pairs should be carefully judged.

10.    Page 15 first paragraph under trait measurements: Regarding the ‘education attainment’ variable, having more information on how the “years of schooling” was evaluated would have been helpful. Was this “total” years of schooling or the years of schooling after a certain educational level? Additionally, it may have been helpful to stratify education in terms of years post-secondary education or specify if “trades” were considered as part of education instead of the traditional university/academia route.

11.    Though included in the table, there does not appear to be a description available to demonstrate the differences amongst mean cell hemoglobin (MCH), mean cell hemoglobin concentration (MCHC), and mean cell hemoglobin volume (MCHV).

12.    Page 19 first paragraph: Given that 20% of the height and body mass index (BMI) information was self-reported, were there any tests and subsequent adjustments utilized to determine whether or not there was a difference for that subset of individuals? The authors did not appear to touch on this subject.

13.    Page 15 first paragraph: After regressing out year-of-birth, the authors’ results showed a slight bias towards those with higher socioeconomic status (SES). Given that the RDR method does include separating indirect genetic effects via random segregation, was this SES factor taken into account when looking at the 14 quantitative traits? There does not seem to be a mention of this within the paper.

14.    It is often assumed the Iceland has a fairly homogenous population. Given that there no mention of “race” as an adjusting variable, it would have been interesting to have further touched on this topic aside from the mention of such limitation in the discussion section.

15.    Given that the RDR methods requires parents of probates to be genotyped and intensive datasets of this type are quite rare, what steps would the authors suggest moving forward?