Preprint reviews by Torsten Seemann

plasmidSPAdes: Assembling Plasmids from Whole Genome Sequencing Data

Dmitry Antipov, Nolan Hartwick, Max Shen, Mikhail Raiko, Alla Lapidus, Pavel Pevzner

Review posted on 16th April 2016

I think this tool does home potential to improve plasmid assembly and recovery.

I have some comments:

(1) I feel it is overstating the problem of plasmids being missed. They are usually in the contigs (and graph) but obviously fragmented and sometimes joined to the chromosome graph, but they are there, and can be baited by looking for classical plasmid genes (rep etc).

(2) The text has no actual description of the algorithm method, or any structured results.

(3) The code/binary uses the same exe names etc as regular Spades, so it is difficult to install alongside regular spades.

(4) It seemed to ignore my --threads parameter (at least for hammer stage)

(5) I fed it a challenging multi plasmid data set of GAI data (PE 36 bp) and it failed on "-k auto" because it said k=55 was bigger than read length. I assumed "-k auto" would know the readlength as it just indexed/corrected all the reads?

I look forward to further development of plasmidSpades and the integration of the methods into regular Spades.

show less

See response

NxRepair: Error correction in de novo sequence assembly using Nextera mate pairs

Rebecca R Murphy, Jared M O', Connell, Anthony J Cox, Ole B Schulz-Trieglaff

Review posted on 14th January 2015

Basic reporting

- The introduction refers to "assembly errors" but does not distinguish between types of errors, like SNPs, indels, or contig joining mistakes

- No explanation of "insert size" or "mate pair" and "paired end" is given, many readers may not understand these concepts
- No reference for "Nextera Mate Pair" is given
- Existing tools REAPR and ALE are described, and a "Bayesian" method is mentioned but no motivation provided for its mentioning
- The phrase "de novo" should be italicised
- "de Bruijn Graph" should be lowercase "graph"
- missing space at "W is 200 bases"
- interval [i-W,i+W] is 2W+1 not 2W as reported

Experimental design

- It is not clear if you re-sequenced the exact same strains as the reference genomes in NCBI and where these strains were obtained from.
- Versions of software (bwa, samtools, etc) need to be reported
- BWA was used with default parameters, which includes lots of partially mapped reads and alternative mappings. It is unclear how nxRepair handled these.
- It should be made clearer that you are using the same reads for both assembly and post-assembly correcting

Validity of the findings

- The sequencing data is only available on Illumina BaseSpace. This needs to be rectified by placing the reads into a Study on NCBI SRA or into ENA so they are guaranteed to be publicly available.
- Table 1 can be improved by adding in the full species name, the genome size, and the global mate pair statistics that were estimated
- Some measure of the yield, quality and average read length (after clipping) should be provided
- It is claimed the nxRepair fixed 6 of 9 genomes, but Table 1 shows only changes to 3 of the 9 genomes?

Comments for the author

- Could this method be incorporated into Spades? Spades already re-aligns the reads back with BWA to correct some errors, so adding in a MP consistency check would be good.
- Do you really need the interval tree data structure, or could the stats you need be computed in a 1-pass manner?
- The use of a uniform distribution for the non-MP reads was interesting. I would have thought most non-MP reads were shadow PE reads, so their distribution would be Gaussian with a low mean and smaller standard deviation, rather than uniform.
- When you break an identified mis-assembly, the trimming part concerns me. Does this mean you are removing a chunk of genomic DNA from the final result? So we could lose genes?

show less