Completed on 11 Mar 2016 by John Didion .
Login to endorse this review.
We reviewed this paper in our February 2016 preprint journal club. First, we found the research question interesting and important – if a substantial fraction of ethnicity is explained by non-genetic effects, then this is clinically relevant information and should be taken into account during treatment, drug development and testing, etc. Our main concern was that the study design makes it difficult to believe that any associations with Puerto Rican ancestry are not due to environmental effects, since nearly 90% of the self-identified Puerto Ricans and none of the self-identified Mexicans were recruited in Puerto Rico. The authors seem to realize this problem because at several points they either test for association with recruitment site, or correct for recruitment site in tests of association between ethnicity and methylation. However, we suspect that, if instead of using recruitment site as a multi-value or continuous covariate the authors use “recruitment site == Puerto Rico” as a binary covariate, some of the significant associations between methylation and ethnicity might go away. If we were reviewers on this paper, we would ask for that additional analysis. Similarly, trying to identify methylation effects of Puerto Rican ethnicity that are independent of environmental differences that are particular to Puerto Rico (perhaps there’s a different smoking rate or level of air pollution there than in the other recruitment sites?) is problematic given this study’s data set.
Another analysis that we think is important when comparing results to previously reported findings is testing whether the effect sizes and directions are consistent. For example, in the “Ethnic differences in environmentally-associated methylation sites” section, do the 19 nominally significant loci that were previously identified in a study of Norwegian newborns have the same direction of methylation change between the two studies? This would require you to know the smoking rate among your sample populations, but you could use the population smoking rates at the recruitment sites as a reasonable proxy.
Some minor comments:
• The cis-meQTL analysis is certainly important, but it would be nice to know whether you tested for trans effects, and whether any loci came up significant.
• We found it a bit odd that Bonferroni correction was used rather than the now more common FDR control. Does the number of significant associations change when using FDR <= 0.05 rather than a p-value threshold?
• For figures 1-3, the A panels are genome-wide analyses while the remainder of the panels focus on a specific locus. The A panels should either be split into separate figures, or each panel should be very clearly labeled with a title indicating what is being shown.