Review for "Disease variants alter transcription factor levels and methylation of their binding sites"

Completed on 12 Jan 2016 by NIH/NHGRI preprint journal club and John Didion.

Comments to author

We reviewed this paper in our preprint-focused journal club at NIH/NHGRI. Generally, we were very impressed with the depth of the data set, and with the care take in choice of analytical approaches.

We recommend adding a section to the introduction explaining the different models that might explain relationships between SNPs, gene expression, and methylation, and which (if any) the authors, at the outset of their study, hypothesized to explain all (or the majority) of associations. For example, was TF binding expected to cause changes in methylation within/near the binding site (and if so, how), and/or was methylation expected to disrupt TF binding?

One substantial concern we had was with the use of language that implies causality. The authors found significant correlations in their eQTL, meQTL, and eQTM analyses that allow for testable hypotheses and working models to be generated, but the lack of any functional validation means that causality cannot be determined. For one example, the word “affects” on line 158 should be replaced with “is associated with.”

One obvious analysis we expected to see was the association between eQTL of genes that encode methyltransferases and methyl-binding proteins, and the targets of those proteins (or global methylation levels, in the case of non-specific methyltransferases). If such an association was looked for and not found, the authors should say so (in the supplement, at the very least). Other associations the authors could probe for, but which may be outside the scope of the paper, are non-coding RNAs (especially in light of the findings in Lemire et al) and small RNAs.

Minor comments:

• The figure legends need to be more informative, especially for figure 1. It was very difficult to understand what was going on in the panel below 1A/B.

• The pie chart in figure 1D is difficult to interpret. Please consider a bar chart instead.

• You never define meQTL. It would be especially helpful to have a sentence distinguishing between usage that implies a particular SNP (which may be associated with multiple CpGs) versus individual SNP-CpG pairs.

• In figure 4a, methylation levels are shown relative to the minor allele for each SNP. However, in the text, alleles are referred to as risk or non-risk, but it is never stated whether the risk allele is the minor allele. We suggest modifying the figure to instead display values in terms of risk alleles.

• It would be helpful to mention how much of the genome is being interrogated by the current method. The authors may be able to speculate, or to predict from whole-genome bisulfite sequencing data generated in other studies, how much they are missing by using only sites probed by the 450k array.