Preprint reviews by Mihaela Pertea

AGOUTI: improving genome assembly and annotation using transcriptome data

Simo V Zhang, Luting Zhuo, Matthew W Hahn

Review posted on 18th February 2016

In this paper the authors introduce a new scaffolder - AGOUTI - that uses RNA-seq data to improve assembly as well as gene annotations. This is a useful tool that will help the scientific community if it does indeed accomplish its goal of being accurate and effective. However I am not convinced of its accuracy from the data presented in this manuscript, as I am detailing below.

My concerns regarding this paper:

1. I think the test data for evaluating AGOUTI is not realistic. The authors just randomly fragment the genome of C.elegans into a multiple number of contigs. In consequence, the contigs that will be given as input to AGOUTI will just be perfectly assembled fragments of the genome. The result of a real assembly will have many errors that are not only due to scaffolding, but caused for instance by the repeats in the genome. There will likely be many gaps in the genome, and the contigs might have many missasemblies as well. These ideal assemblies that the authors simulate will not be encountered in real situations and I believe the accuracies presented here are unrealistic both for the resulting scaffolds as well as for the resulting gene annotation. I am suspecting that this ideal input is the reason for the very small number of errors that AGOUTI does.

A real assembly will heavily impact the accuracy of any gene finding tool used on it. I suggest that the authors start with simulated reads from C.elegans and assemble them using one of the popular de novo assemblers instead, and then evaluate AGOUTI on the resulting assembly and gene annotation.

2. Another concern I have about AGOUTI's performance is related to the comparison to other scaffolders. There is a large number of scaffolders available, but the authors only pick one of them because that's the only one that also uses RNA-seq data. There is no point to using more information (RNA-seq) data if the resulting accuracy is no better than that of a scaffolder that wouldn't use it. Therefore the authors can not just dismiss scaffolders that don't use RNA-seq data without proving that AGOUTI will do a better job than any of them.

3. There are no running times presented in the manuscript. The user can not immediately see if it would be efficient to use this tool especially on larger data sets, instead of other tools. I would have liked to see a comparison on running times to RNAPATH for instance since AGOUTI clearly increases the search space when compared to the former tool.

4. Related to point 3 above, the authors didn't specify what AGOUTI does when it has to choose between edges of equal weight. Does it choose one edge randomly, or does it take into consideration both possible paths, which in consequence increases even more the search space?

Minor concern:

The author should better explain Table 4. Not clear what the cases in the table are and how they can be present for non-consecutive contigs.

Level of interest

Please indicate how interesting you found the manuscript:

An article whose findings are important to those with closely related research interests.

Quality of written English

Please indicate the quality of language in the manuscript:

Declaration of competing interests

Please complete a declaration of competing interests, considering the following questions:
1. Have you in the past five years received reimbursements, fees, funding, or salary from an
organisation that may in any way gain or lose financially from the publication of this
manuscript, either now or in the future?
2. Do you hold any stocks or shares in an organisation that may in any way gain or lose
financially from the publication of this manuscript, either now or in the future?
3. Do you hold or are you currently applying for any patents relating to the content of the
4. Have you received reimbursements, fees, funding, or salary from an organization that
holds or has applied for patents relating to the content of the manuscript?
5. Do you have any other financial competing interests?
6. Do you have any non-financial competing interests in relation to this paper?
If you can answer no to all of the above, write 'I declare that I have no competing interests'
below. If your reply is yes to any, please give details below.

I declare that I have no competing interests.

I agree to the open peer review policy of the journal. I understand that my name will be included
on my report to the authors and, if the manuscript is accepted for publication, my named report
including any attachments I upload will be posted on the website along with the authors'
responses. I agree for my report to be made available under an Open Access Creative Commons
CC-BY license ( I understand that any comments
which I do not wish to be included in my named report can be included as confidential comments
to the editors, which will not be published.

I agree to the open peer review policy of the journal.

Authors' response to reviews: (

show less