Keith R. Bradnam, Joseph N. Fass, Anton Alexandrov, Paul Baranay, Michael Bechner, İ, nanç, Birol, Sé, bastien Boisvert, Jarrod A. Chapman, Guillaume Chapuis, Rayan Chikhi, Hamidreza Chitsaz, Wen-Chi Chou, Jacques Corbeil, Cristian Del Fabbro, T. Roderick Docking, Richard Durbin, Dent Earl, Scott Emrich, Pavel Fedotov, Nuno A. Fonseca, Ganeshkumar Ganapathy, Richard A. Gibbs, Sante Gnerre, É, lé, nie Godzaridis, Steve Goldstein, Matthias Haimel, Giles Hall, David Haussler, Joseph B. Hiatt, Isaac Y. Ho, Jason Howard, Martin Hunt, Shaun D. Jackman, David B Jaffe, Erich Jarvis, Huaiyang Jiang, Sergey Kazakov, Paul J. Kersey, Jacob O. Kitzman, James R. Knight, Sergey Koren, Tak-Wah Lam, Dominique Lavenier, Franç, ois Laviolette, Yingrui Li, Zhenyu Li, Binghang Liu, Yue Liu, Ruibang Luo, Iain MacCallum, Matthew D MacManes, Nicolas Maillet, Sergey Melnikov, Bruno Miguel Vieira, Delphine Naquin, Zemin Ning, Thomas D. Otto, Benedict Paten, Octá, vio S. Paulo, Adam M. Phillippy, Francisco Pina-Martins, Michael Place, Dariusz Przybylski, Xiang Qin, Carson Qu, Filipe J Ribeiro, Stephen Richards, Daniel S. Rokhsar, J. Graham Ruby, Simone Scalabrin, Michael C. Schatz, David C. Schwartz, Alexey Sergushichev, Ted Sharpe, Timothy I. Shaw, Jay Shendure, Yujian Shi, Jared T. Simpson, Henry Song, Fedor Tsarev, Francesco Vezzi, Riccardo Vicedomini, Jun Wang, Kim C. Worley, Shuangye Yin, Siu-Ming Yiu, Jianying Yuan, Guojie Zhang, Hao Zhang, Shiguo Zhou, Ian F. Korf
Review posted on 23rd February 2013
Bradnam et al. systematically evaluate and compare 43 de novo genome
assemblies of 3 organisms from 21 teams. My lab and I set out to
evaluate the paper.
First, reviewing this paper thoroughly is effectively impossible; it's
huge and complicated! So apologies for any mistakes below...
At a high level, this paper is a tour de force, analyzing the results
of applying a dozen or more different de novo assembly pipelines to
three different data sets, and ranking the results by a variety of
different metrics. The three genomes chosen, fish, snake, and bird,
are all vertebrate genomes, so they're large and repeat-ridden, and in
some cases highly polymorphic, which makes this an extra challenging
(but realistic) set of assembly problems. The major problem to be
overcome by this paper is that we are evaluating fuzzy heuristic-laden
software against fuzzy error-prone data in a situation where we don't
know the answer, and I think given these constraints the authors do as
good a job as is reasonably possible.
The resulting paper does an excellent job of broadly presenting the
challenges of assembly, providing a good if rather high level
discussion of the various ranking metrics they used. Their broad
conclusions were well supported: assemblers do very different things
to the same data, and you need to pick an approach and a set of
parameters that maximize the sensitivity and specificity for your
project goals; and repeats and heterozygosity will kill you.
From a scientific perspective, I was dismayed by their failure to make
use of external data such as synteny and gene model concordance to
evaluate the assemblies. The CEGMA scores were probably the closest
to this but the numbers were surprisingly low to me, so either CEGMA
doesn't work that well on vertebrate genomes or the assemblies are
actually worse than the paper made clear. The fosmid and optical map
analyses were not that convincing, because while they spoke to some
sort of basic agreement with orthogonal data, they didn't have the
breadth (fosmid) or resolution (optical map) to provide really solid
independent evidence of the quality. When I am evaluating assemblies
I look for "surprises" in terms of missing gene models and odd
rearrangements compared to neighbors, and I feel like there are some
reasonably straightforward things that could have been done here.
Nonetheless, the analyses that were done were done well and discussed
well and led to clearly defensible conclusions.
I was specifically asked to evaluate reproducibility or replicability
of the analyses, which I will address below.
A major missing component of the paper was computational cost. As
someone who works primarily on how to achieve assemblies on low-cost
hardware, I can assure assembler authors that very few people can
easily run multiple assemblies on large amounts of data using their
assemblers. This (and ease of use, documentation, and community) is
honestly going to drive choice of assembly pipelines far more so than
notional correctness. This is especially true since a conclusion of
the paper was "try lots of assemblers because they all do differently
weird things on different data", which would lead me to the
time-saving argument that a 60% accurate easily achievable assembly is
considerably better than an 80% assembly that cannot readily be
computed. Perhaps assemblathon 3 can be more tuned towards the
question of whether or not anyone other than the authors can run these
things and achieve good results!
While I'm talking about what I wish could have been done, it would
have been nice to have something like RNAseq for the organisms.
RNAseq can be used to look at completeness as well, by looking at the
intersection between conserved genes and genes that map to the
assembly, and I think it would have been invaluable. Internally
focused statistical analyses are great, but there's nothing like
orthogonal data (as with the fosmids and the optical map) for real
I also could not figure out how much of the input data was used for
each assembly, and it didn't look like any of the analyses took this
into account -- REAPR is the only one that I would have expected to do
so, but that paper doesn't seem to be available. How many of the
input reads actually map to the genome (or how many of the
high-abundance k-mers are present) would have been an interesting
metric, although I recognize that repeats and het make this a
difficult metric to analyze.
Finally, I think the fact that experts (in many cases the authors of
the assembler) are running the assemblers should be mentioned more
clearly: these results are presumably the best possible from those
software packages, and 3rd-party users are unlikely to do as well.
We were explicitly asked to assess reproducibility (technically,
Here my group and I were disappointed, on two accounts.
First, the instructions for replicating the assemblies are in some
cases very sparse, and frequently missing entirely. Did the ABySS
team really not do any read trimming? (SI file 3, p2-3) Did ALLPATHS
really do no trimming? The Ray team should provide the "few
modifications" somewhere, too. SGA? PRICE? Am I missing these
entries or were they not submitted??
Second, the "forensic" evidence is equally somewhat lacking. There
appears to have been little standardization of how to report the exact
pipeline used, the versions of the software used, the amount of CPU
time and memory required, etc. I think this was a missed opportunity.
It's probably too late to remedy and shouldn't kill the paper, of
Basically, if replicability of assemblies is considered important for
publication, the material in this paper needs some skeptical review
and revision by the authors. At the very minimum, I would request
that each included team provide the parameters and software version
Miscellaneous comments and questions:
The BCM-HGSC fish assembler used Newbler to assemble Illumina. Any
special details on getting this to work? We were under the impression
that Newbler couldn't be used on Illumina.
We found the penalty for PacBio/Ns when used for scaffolding to be
The fosmid data was used by the ABySS fish team in their assembly,
and it doesn't sound like anybody else used it. Apart from possible
circularity in the analysis, this also might be noted (I didn't
see it, but I could have missed it).
The BCM team also claims they used Velvet for their assembly (Table
1), which was also used to assemble the fosmids. This potential
circularity presumably was addressed in the withheld-metric
analysis but might be worth mentioning somewhere. (Although I didn't
see Velvet mentioned in the actual assembly-pipeline details in
the SI, so maybe the Table 1 entry is wrong?)
Level of interest: An exceptional article
Quality of written English: Acceptable
Statistical review: Yes, but I do not feel adequately qualified to assess the statistics.
Declaration of competing interests: I declare that I have no competing interests.