Review for "Mammalian genomic regulatory regions predicted by utilizing human genomics, transcriptomics and epigenetics data"

Completed on 11 Jul 2017 by Kyle Schachtschneider .

Login to endorse this review.

Comments to author

This manuscript describes the creation and utilisation of a novel bioinformatics tool called the Human Projection of Regulatory Regions (HPRS) to utilize publicly available human regulatory datasets to predict regulatory regions in other mammalian species. The pipeline is first optimized and then used to identify regulatory regions (i.e. promoters, enhancers, etc.) in the cattle genome, which are then confirmed using publicly available bovine datasets. This pipeline is then applied to 9 additional mammalian species, and the utility to use this pipeline to identify mechanisms underlying the effects of SNPs located in non-coding regions identified through previously published GWAS studies is demonstrated. Overall the HPRS pipeline represents a highly valuable tool for interpretation of previous as well as future results focused on genomic and epigenomic variation underlying phenotypes, a tool that will only increase in value as additional mammalian regulatory datasets become available, for example through the ongoing FAANG initiative. My specific comments are listed below:

1. The authors use the term non-model species to describe the species utilized. However, pigs, as well as some of the other species listed in Table 4 are commonly used as biomedical models, for example. The authors should consider changing the language used.

2. In Table 4, it would be beneficial for the authors to provide a filtered dataset for mouse as well. As there are a large number of regulatory datasets available for mice, the inclusion of this analysis would help demonstrate the potential benefit of having more species-specific datasets when performing filtering compared to pigs and cattle where relatively few are available. This would help demonstrate the ability of the pipeline to integrate large-scale datasets as they become available from groups such as FAANG with human data, as the authors suggest in the conclusions.

3. Table 1, some percentages are not included and should be added.

4. The spacing in the header of Table 2 should be fixed, it is currently difficult to determine what the title for each column is.

5. The legend for Figure 3 should be revised. 3a and 3b appear to be switched. Also it's not clear what the difference between 3c and 3d are. Is one focused on promoters and the other enhancers?