Review for "Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species"

Completed on 11 Feb 2013 by Mick Watson . Sourced from Publons:


Login to endorse this review.

Comments to author

General comments

The authors describe the outputs of “Assemblathon 2”, an international effort to compare genome assemblers, assembly strategies and the usefulness of various assembly metrics. The major conclusions appear to be that most assembly strategies produce useful assemblies, not all strategies work on all types of genome and all types of data, and that no single metric can be used to assess assembly quality.

Clearly a lot of work has gone into this manuscript, and a lot of information is presented to the reader. The authors must be congratulated on gathering such a large amount of work into a single, coherent paper.

As a technical paper describing the use of a variety of metrics to assess the quality and diversity of multiple assemblies, the paper is publishable “as is”, and therefore everything is a “discretionary revision”. However, the paper could be improved in several ways.

The first question which occurs to me is this: what was the purpose of this international effort? What were the authors trying to achieve? Was it:

  1. To catalogue available assemblers?
  2. To compare available assemblers?
  3. To develop best practice?
  4. To develop a set of guidelines? i.e. which assembler should I use on my data?
  5. To compare assembly metrics?
  6. To develop better assembly metrics?
  7. etc

I would encourage the authors to fully and clearly state the aims of the project; having read the manuscript, it still remains unclear what exactly the expected outcomes were, and whether they have been achieved.

The structure of the manuscript is complex. We have several factors at play here:

  1. 3 species
  2. 43 assemblies
  3. 10 different metrics
  4. Contigs vs scaffolds

I found the manuscript quite difficult to read, and a more logical structure might be:

  1. Define the different requirements one might have of an assembly (e.g. longest scaffolds; most genes in contigs etc)
  2. Define (and justify) the metric chosen to answer each question from (1)
  3. For each question/metric…
  4. For each species…
  5. For both contig and scaffold assemblies….
  6. Compare the performance of each assembler

I would have liked to have seen consistently separate analyses of “contig” and “scaffold” assemblies. For example, the “presence of core genes” analysis is performed only on scaffolds; other analyses separate contigs from scaffolds. The latter is important – for example, as a reader, I would want to know if e.g. a given assembler is brilliant at producing contigs but awful at scaffolding.

The choice of bird, fish and snake should be justified. This is very important. For the performance comparison of genome assemblers, the ideal scenario would be to select a range of genomes of varying size and complexity e.g. haploid, small diploid, large diploid, polyploidy, non-repetitive, repetitive etc. The three chosen genomes are of similar size (1.6Gb, 1.2Gb and 1Gb) and are all diploid (as far as I can tell). What was the justification for choosing these genomes?

Discretionary revisions

  1. The abstract mentions representation of “regulatory sequences” but I am not sure the paper actually addresses presence/absence of these?

  2. There is an issue of “over assembly”, which the authors offer some possible explanations for (e.g. it’s possible certain teams have assembled multiple haplotypes of the same locus). No attempt is made to validate this explanation. Is it possible to extend the NG50 graphs beyond 100%, to identify assemblers that have “over assembled”?

  3. Should gene-based metrics (“presence of core genes”) be calculated on contig assemblies? The danger is that, in scaffolds, parts of the gene fall in gaps. A very high quality gene-centric assembly would have the majority of genes in contigs. This is an important analysis, I feel.

  4. Ranks are used consistently, but some additional judgement could also be used. For example, in the “COMPASS analysis of VFRs”, it is stated that the Ray assembler ranked 1st for all individual measures except multiplicity, where it ranks 7th. This does not paint the entire picture. Whilst this latter rank of 7th is true, on inspection of Figure 9, it is clear that Ray is still performing very well, and is part of a sub-group of assemblers whose performance is almost equally good. Pointing out that Ray ranks 7th may suggest to the reader that Ray’s performance is bad for multiplicity, when in effect the opposite is true.

  5. Figure 20 shows that for all three species, the z-score is correlated very well with N50. The authors point out that only fish and snake are significant, but there certainly seems a good relationship in bird also. This is a useful conclusion. Whilst the authors are keen to point out that no single metric captures all information, this graph appears to show that if you have nothing else, N50 can be a useful single metric.

  6. In the discussion it is pointed out that the SOAPdenovo entry used mislabelled mate-pair information and therefore the entry is incorrect. I would recommend either removing the SOAPdenovo data point, or repeating the analysis with a new, corrected SOAPdenovo entry

  7. The discussion of the CoBig and PRICE assemblies seems out of place. The CoBig assembly is not analysed in any way. The PRICE assembly had different aims to every other assembler (to assemble only genic regions), and even in that, it failed, with a lower number of core genes than any other assembly. The fact it would come first if normalised to total amount of sequence assembled is irrelevant, as the aim was not to produce a full genome assembly.

On “Conclusions”

One would hope that the outcome of efforts such as Assemblathon would be to create a useful guide for those new to the field, e.g. to answer questions such as:

  1. If I have a large, diploid genome, which assembler should I use?
  2. If I have a repeat-rich genome, which assembler should I use?
  3. If I have a polyploidy genome, which assembler should I use?
  4. Which assembler is “best” at scaffolding?

This relates to the purpose of the project, which I mention above. Whilst the authors have undoubtedly compared a multitude of assembly strategies, using a wide range of metrics, there is still no “best practice” or set of guidelines for choosing the best assembler for a particular biological problem. After such a lot of time and effort, the conclusions and five “practical considerations” seem like a disappointing outcome. I wonder if the authors could put more thought into this? For example, SGA seems very good at assembling the snake genome. Why? What is it about the snake genome that made SGA better than the rest? Was it the repeat structure of the genome? The type and amount of data? If someone comes along with a snake-like genome, should they choose SGA?

Is there any chance the group could take all of the data they have produced, and provide guidance for people who want to assemble genomes but don’t know which assembly strategy is best for their “type” of genome?

Level of interest: An article of importance in its field

Quality of written English: Acceptable

Statistical review: No, the manuscript does not need to be seen by a statistician.

Declaration of competing interests: I declare that I have no competing interests

Mick Watson