Completed on 31 Jul 2017 by Anshul Kundaje. Sourced from http://www.biorxiv.org/content/early/2017/07/31/170761.
Login to endorse this review.
This is an interesting fusion of kernel methods and DNNs.
However, I have a few issues with the results and some suggestions for improvement. Hopefully you find them useful.
1. Reporting performance on randomly sampled negatives inflates performance. There are much harder sequences in the genome than randomly sampled negatives e.g. GC or dinuc. matched negatives or open chromatin sites.
2. Most of the paper reports performance on balanced negative sets. In reality, we have orders of magnitude more negatives than positives. The authors should be reporting performance on entire held out chromosomes (which is reflective of how this program would be used in practice). It is in this unbalanced setting that auPRC, precision/recall and F1 will significantly diverge from auROC and from performance observed on with balanced datasets. You do perform some experiments with increasing number of negative examples, but then you report auROC. auROC can trivially increase as you increase negatives. I would suggest you once again report performance that is sensitive to class imbalance on entire held out chromosomes relative to changing the class imbalance in training.
3. It seems that you optimized parameters of gkm-DNN but its not clear if you did the same for gkm-SVM (looks like you use default parameters).
4. Comparison to optimized CNN architectures on raw sequence would also be useful to know whether there is an advantage to using gk-fvs over learning from a one-hot encoded representation.
Please correct me if you think I misunderstood anything.