Review for "Accurate promoter and enhancer identification in 127 ENCODE and Roadmap Epigenomics cell types and tissues by GenoSTAN"

Completed on 15 Apr 2016 by Maxwell W Libbrecht. Sourced from

Login to endorse this review.

Comments to author

Author response is in blue.

Very interesting paper! However, I would like to clarify a few things about Segway:

(1) The manuscript claims that Segway is run on smoothed data, and the authors smooth the data in a 1 kbp window before using at input to Segway. This smoothing step is not recommended in any of the Segway manuscripts, and was not used for previous Segway runs.

(2) The manuscript claims that Segway assumes that all tracks have the same variance. In fact, Segway uses the same variance for parameter for all /labels/, but uses different variance parameters for different tracks. As I understand it, GenoSTAN uses separate variance parameters for all track-label pairs, so this is still a difference between the methods.

(3) Segway is typically run on fold-enrichment tracks that measure the enrichment of reads relative to an input control, not raw reads. This distinction is not mentioned in the manuscript.

(4) On page 3, the manuscript claims that Segway is run on log-transformed data. By default, Segway applies an inverse hyperbolic sin (asinh) transform, and log is not an available transformation within the software.

Maxwell W Libbrecht
Member of the Segway team

Dear Max,

thanks for your comments.

to (1) Smoothing was done for Segway by Hoffman et al. (NAR, 2012), as described in the supplements of the paper on page 2 point (3) We have used a larger window size because we worked with 200bp binned data in a first place to make it comparable to the data that we used in the other analyses (Hoffman et al. worked on bp resolution). Hence, although there were differences in the window size, smoothing per se did not occur to us to be not recommended for Segway.

to (2) This was a mistake. This will be corrected in the revised manuscript.

to (3): We agree with this. We have tried in the manuscript to distinguish two aspects: 1) comparing algorithms when run on the exact same data and 2) comparing annotations possibly based on different input datasets. Thus, for 1) we run all methods providing input to any method. For 2) We take the original annotations, for which the Segway authors used input as control tracks. In either scenario, we find that the Segway annotation has lower performance compared to ChromHMM and GenoSTAN. We think that both approaches (absolute data, or relative enrichment over input) have their drawbacks: fold enrichment over input becomes extremely unstable for regions with low input, which might cause artefacts. Absolute data has the problem of possibly not properly correcting for DNA accessibility, amplification biases, and mapping biases. Although interesting, a dedicated investigation of the contribution of input data per se was beyond the scope of our study.

to (4) We have actually used the default function (asinh). This will be corrected in the revised manuscript.

Best wishes,

Julien Gagneur