Review for "sCNAphase: using haplotype resolved read depth to genotype somatic copy number alterations from low cellularity aneuploid tumors"

Completed on 5 May 2016

Login to endorse this review.

Comments to author

Author response.

This manuscript describes a new approach to the analysis of somatic copy number variants. The approach is somewhat related to that of the Battenberg algorithm and other algorithms that phase alleles in order to improve copy number calls, but it uses an alternative phasing algorithm and also estimates ploidy and cellularity.

The authors demonstrate that the method works well on data down to very low cellularity on a range of cell line mixture datasets.

The idea here appears interesting, and further exploration of approaches to improve allele-specific copy number analysis and new implementations are worthwhile. The method looks promising and the manuscript may eventually deserve to be published, but not in its current state. The writing in the manuscript needs much improvement. This made it very difficult to assess.


* the description of the methods is not sufficiently clear

The entire methods section (Methods:Page 5-10) was completely re-written to provide a clearer, more concise summary of sCNAphase and the analyses performed here. The new section covers the origin of the datasets, terms, the statistical model behind sCNAphase as well as the methods and tools used in the analyses performed here. This involved rewording some of the sentences (as pointed out by reviewer 2), breaking some longer paragraphs into short subsections and changing the introductory flow. The new method section is now arranged as follows:

Phasing Matched Normal
Calculating regional haplotype depth
Statistical model of haplotype depth under null hypothesis of absence of tumour DNA
Tumor purity and tumor cellularity
Hidden Markov Model based on haplotype segments
Estimation of tumor cellularity and ploidy
Estimation of copy number profile
Merging errors at copy number switches
Comparison of two copy number segmentations

* the method needs to be put into context of other tools better

We have expanded explanations of how other tools work (particularly ASCAT and ABSOLUTE) in the introductory section (Page 3, line 65-75), and used this to better introduce the methodological innovations of sCNAphase in the methods section.

“More recent technologies, such as single-nucleotide polymorphism (SNP) microarrays has provided powerful approaches for interrogating the tumor genome and identifying copy number mutations (13,19-22). These include ASCAT (23) and ABSOLUTE (13), both of which initially pre-segment SNPs into regions of equal copy-number (using a threshold based, model-free approach) and subsequently estimate ploidy and tumour purity by use of a model for the observed read-depth data conditional on the fixed segmentation. ASCAT and ABSOLUTE are highly successful in samples with as little as 40% tumor DNA (13,23), however the reliance on an initial model-free segmentation is likely to limit the ability of these methods to detect copy number alterations at lower tumour cellularities (Supplementary Table S1). The performance of these tools are also restricted by the resolution of different microarray platforms as well as fluorescence signal saturation at high copy number.”

We explain in the following paragraph on methods for HTS that "Both ASCAT and ABSOLUTE have been modified to be applicable to HTS".

* a more thorough comparison with additional tools is required. For example, ASCAT can be applied to sequence data and estimates purity and ploidy.

We have included results from ASCAT in the revised manuscript. The results can be seen in Page 13-15, also in Figure 4 and Figure 5. Overall, ASCAT performed very well at above 60% tumour purity, but the performance fell at lower purities.

Specific comments:

p5-12: The entire methods section needs a careful rewrite. This was very difficult to follow. Please check that acronyms are defined when they are first used (e.g. MMP, AD)

p5, line 40: "All the tumor cell-lines are cultured" should be "All of the tumor cell-lines were cultured".

Last paragraph. This subsection needs to explain the method in precisely and in detail with version numbers of tools.

Line 55: We chose SHAPEIT2 for haplotype phasing to perform - rewrite.

p6, line 6: "however tumor genome is" should be "however tumor genomes are"

Line 17: AD not defined when first used

Line 21: "which is most fitted to the merged read depths and phased allelic depths"

Line 23: estimations -> estimates.

Line 33 multithread -> multithreading

p7, Line 5: allelic-specific -> allele-specific

Line 8: "aproximates" --> "is approximated by"

Line 22: Define M and P. e.g. "maternal (M) and paternal (P) alleles for SNP i". Then useful to also state MMP is 2 maternal copies and 1 paternal.

Please carefully go over the entire mathods section to make sure that it is crystal clear.

We have addressed these points in the revised methods section.

p13. Start of results section.

The method should be briefly described in the results section.

We now briefly describe the methods in the results section.

The motivation for using these datasets needs to be made clear here, and at the start of the results. It occurred to me when reading the Figure 4 caption that (at least some) of these samples were carefully selected to cover a range of ploidies (hyper-tetraploid, hypo-tetraploi, etc). Great design, not stated anywhere explicitly.

A brief rationale for the benefits of this selection of tumors is included in the initial paragraph of the results section (see response to previous question). We have also modified Table 1 to include more clinical information about the primary tumor the cell-lines were derived from.

p22, line 5: "tumor purity is frequently low (10% at most)". The first part of this statement is true. I think the second part "(10% at most)" is uncertain at this stage and should be qualified e.g. "(purities of at most 10% have been found in some studies)" or removed.

We have removed reference to 10% here.

p28: Formatting of Table 2 headers

Figure 3 is very unclear. The dSKY parts of the figure are miniscule and need to be bigger and better explained.

In order to better introduce digital SKY plots, we created a new, simpler figure, designed specifically to introduce the reader to dSKY plots. Using the copy number mutations seen in three chromosomes, from the HCC1187 cell-line, the new Figure 3 demonstrates the capacity of dSKY plots to mirror the copy number changes shown by traditional SKY images, as well as the capacity of these plots to show information that could not be shown using the traditional approach, such as the specific location of sCNAs and regions that have undergone a Loss of Heterozygosity event. Finally, in Figure 3C, we show the power of this approach to deconvolute complex copy number profile through the analysis of chr1. The previous figure has been combined with Supplementary Figure 4. These information is presented on Page 12, lines 317-339.

p21, line 40 & p34: "sCNAphase was found to consistently outperform this methodology" and other statements. It seems sCNAphase generally significantly outperforms climat, but not always (e.g. 5C). Please describe the results more accurately. Add more tools to this comparison. Can you account for the occasional better performance of Climat (in terms of sensitivity).

We have added in the following:

"One limitation for sCNAphase to reach even higher sensitivity was that sCNAphase provided no
estimation at long regions with few SNPs or with highly variable depth profile (see Methods). For
example, chr9p1-13 of HCC1187 was defined as a region of LOH by both COSMIC and CLImAT
(Figure 5E); however sCNAphase only identified small islands of LOH (marked in pink) and left the
majority as undetermined. When investigating the raw BAFs from this location (Figure 5F), the
majority of the BAFs fell randomly between 0 to 1, producing a depth profile that was too confounding
for sCNAphase to confidently resolve."

We have also included ASCAT in this comparison.