Review for "Streaming algorithms for identification of pathogens and antibiotic resistance potential from real-time MinION sequencing"

Completed on 17 May 2016 by Willem van Schaik .

Login to endorse this review.

Comments to author

Author response is in blue.

I have previously manuscript this manuscript on and my report and the authors' response to my previous comments can be found here

I believe this revised version is an improvement over the previous version I reviewed, but I still have a number of concerns that remain to be addressed. I remain poorly qualified to assess the bioinformatic and computational approaches and therefore focus on the interpretation of the data.

In my opinion, the section on MLST still needs some editing. Bacterial typing is a term that describes the use of methods to identify relatedness between strains. MLST is a sequence-based typing method that was first introduced in the late 1990s ( and has been widely used since. It has created a vocabulary which describes high-risk 'clones' based on their sequence type (ST). However, for MLST it is essential that STs are assigned with absolute certainty and clearly this is not the case in this dataset. Based on MinION data, the STs of two of three strains cannot be resolved. I feel the only valid conclusion from these analyses can be that the method to identify sequence types may work, but that sequence quality or sequence coverage are essential and need to be higher than in the data sets that were generated as part of this study.

The main point of this section, as the reviewer pointed out, is that while MLST using MinIOn sequencing shows promising results, it cannot assign a ST with absolute certainty. This prompted us to design a streaming algorithm for strain typing based on the presence and absence of genes described in the next section. We have made this point clearer in this revision (Page 5 – column 2: While the results were encouraging, this also suggested that traditional MLST with nanopore sequencing requires high coverage to report the sequence type with absolute certainty.)

In the next section ('strain typing by presence of absence of genes') is confusing. The authors frequently use the word 'strain type' where they should have used 'sequence type' (e.g. in the line 'and identified their strain types using the relevant MLST schemes'), so please check carefully when strain type should be replaced with 'sequence type' or when 'strain type' means something else than 'sequence type'. The authors set up a method to classify strains on the basis of presence/absence of genes. I have expressed my concerns about this methodology in the previous version of the manuscript and the authors have responded to these concerns as follows: 'The gene presence/absence typing approach is designed to provide preliminary strain information extremely rapidly, using both 1D and 2D reads. It is primarily designed for the situation in which an exemplar strain has already been sequenced. We argue that this does have applicability, for example in an outbreak situation where it is very useful to know if a known strain is present in a new sample.' I believe this is a valid potential application but this is not well explained in the manuscript and I believe this point should be made more clearly. The limitation of the approach (i.e. that gene content does not necessarily correlate with sequence type) should also be highlighted.

We have edited throughout to use the word “sequence type” when appropriate.

In the abstract we say 'While strain identification with multi-locus sequencing typing required more than 15x coverage to generate
confident assignments, our novel gene-presence typing could detect the presence of a known strain with 0.5x coverage' to emphasise the restriction to typing known strains.

We also added the sentence when we first introduce the approach:
“This approach is intended to rapidly identify the presence of a sequence type which has already been characterised, for example in an outbreak scenario, with subsequent confirmation using MLST typing once more data has been collected..”

We also added in this sentence in the discussion: Thus it would be ideally suited for rapidly typing a known strain in an outbreak scenario.

Fig 1 adds little information because they key steps of the algorithms (the arrows to 'species typing, strain typing and resistance profile') are not explained in any detail. I believe that adding a schematic overview of this part of the pipeline would make this figure considerably more informative.

We have added a schematic overview of our algorithms in this figure and have revised the caption to explain the purpose of the arrows.

Minor comments:

The discussion section is long and unfocused.

The discussion has been re-organised and shortened. We apologise that paragraphs had been added to address specific comments from reviewers and had not been subsequently propertly integrated.

In particular, we have moved the paragraph comparing our approach to MetaPhlan to the 'Comparison to different algorithms section'.

Fig 3 serves little purpose in my opinion and may be better placed in the Supplementary data.

We agreed with the reviewer and have put Fig 3 in the Supplementary Figure.

Fig 6. Panel c) text is very difficult to read.

Fig 6c is the screen-shot from Metrichor what-in-my-port service and hence is not editable. We have moved it to the supplementary figure 2 so that readers can view in higher resolution, and discuss the results in the text.

p. 1, l. 52 Correct 'when to when to'

p. 2, l. 57: write quasipneuminiae

p. 8, line 44. Correct 'the our real-time analysis'

p. 16, line 40. Write 'an affine gap'

The above typos have been corrected.