Review for "Introgression patterns between house mouse subspecies and species reveal genomic windows of frequent exchange"

Completed on 3 Aug 2017 by Andrew Parker Morgan . Sourced from http://www.biorxiv.org/content/early/2017/07/25/168328.

Login to endorse this review.


Comments to author

Author response is in blue.

Ullrich, Linnenbrink & Tautz make the interesting claim that gene flow between mouse populations with varying degrees of differentiation is highest at loci containing genes with functions in olfaction and adaptive immunity. This finding is intuitively very appealing and potentially exciting. It accords well with known roles of odorant receptors in mouse social behavior [1,2], and has echoes of recent descriptions of adaptive introgression of alleles of immune genes from Neanderthal and Denisova into early modern humans [3].

However, I have some rather serious technical concerns that temper my enthusiasm for the manuscript's key results.

###
(a) The authors have chosen to create a single, synthetic consensus sequence to represent each population. They do so by taking the major allele at each variable site over fixed windows of 25kb. Genetic distances between populations are estimated from these sequences. Although phylogenetic trees are not explicitly constructed for each 25kb window, the distances imply a tree and indeed this is how they are interpreted by the authors. I struggle to understand what any of this means. The authors have thrown away the most useful information in the data by ignoring both allele frequencies and LD. A more rational approach would be based either on phased haplotypes (eg. chromosome painting) or on allele frequencies at approximately unlinked sites (eg. D-statistics).

(b) Even If we accept that a single, synthetic consensus sequence for each population is a meaningful entity, the "dK80" statistic does not necessarily capture "introgression" as claimed. It simply captures (a subset of) departures between the global phylogeny shown in Fig 1 -- which is assumed to exist and to be correct -- and the local phylogeny. An alternative hypothesis to introgression is incomplete lineage sorting (ILS). The authors seem to have done little to distinguish ILS from introgression. Previous work on house mice indicates that ILS is not rare in the mouse genome, although exactly how widespread remains a matter of vigorous debate [eg. 4].

(c) The close overlap between "mutually introgressed" regions and copy-number variable segmental duplications is a big red flag for me. SNV calling was performed with exceedingly liberal filters and little masking for copy number (line 385). The authors are completely correct that previous work has mostly ignored CNV regions -- but this is done with good reason. The approach in this paper conflates allelic and paralogous variation, yielding a local tree that may be distorted not only in branch length (line 394) but also topology. We cannot assume that individual copies of a duplicated gene will have the same evolutionary history, and so cannot lump them together.

A case in point is the Cwc22 gene on chr2 (line 334). The authors point to this locus as an example of "mutual introgression" facilitated by meiotic drive. However, we have recently shown (in great detail) that phylogenetic discordance in this locus is mostly due to non-allelic gene conversion between paralogs of Cwc22 [5].

A second putative example mentioned by the authors is the repeat-heavy long arm of the Y chromosome (line 330). In this case we can definitively rule out the hypothesis that the patterns observed by the authors are due to introgression, because the Y is inherited as a non-recombining unit. Using the same sequence data as these authors, we have shown in a recent preprint that Y chromosomes are completely differentiated between Fra, Ger, Ira, MUS and SPRE [6], in agreement with much previous work [eg. 7]. The same is probably true of CNV regions on the X chromosome (eg. SLX gene family, line 329): inter-(sub)specific gene flow is much lower on most of the mouse X chromosome than the genomic background, at least in part because of the outsized role of the X in hybrid sterility [8].
###

Again, the authors have made a really interesting observation that may be (and to my mind, probably is) true. The amylase pseudogene story is especially compelling. But I feel that the major conclusions have otherwise run ahead of the evidence. I hope that the authors are willing to undertake a re-analysis of their data with more well-established methods for detecting and describing gene flow.

###
REFERENCES
[1] Hastie et al (1979) Cell. http://dx.doi.org/10.1016/0...
[2] Godfrey et al (2004) PNAS. https://doi.org/10.1073/pna...
[3] Abi-Rached et al (2011) Science. http://dx.doi.org/10.1126/s...
[4] White et al (2009) PLoS Genetics. https://doi.org/10.1371/jou...
[5] Morgan et al (2016) Genetics. https://doi.org/10.1534/gen...
[6] Morgan & Pardo-Manuel de Villena (2017) bioRxiv. http://www.biorxiv.org/cont...
[7] Geraldes et al (2008) Molecular Ecology. http://dx.doi.org/10.1111/j...
[8] Teeter et al (2008) Genome Research. https://dx.doi.org/10.1101%...



Great to see that biorxiv is developing into a forum of serious commenting!

We appreciate the comments of Andrew Morgan, since they touch points that may indeed not have been sufficiently clear in the current version of the manuscript. However, from start we need to point out that this is not a standard introgression story, but a major further development compared to our previous haplotype-based findings of introgression in mouse populations (Staubach et al. 2012 paper; PMID: 22956910). This reveals a much more surprising pattern that has - to our knowledge - not been described before. We see windows of gene exchange across sub-species and species, rather than only unidirectional introgression.

detailed responses
(a) We have struggled for quite some time to make sense out of our initial observations of complex phylogenetic discordance around the amylase locus. The standard introgression statistics did not help much in this case. ABBA-BABA based statistics (D-statistic) addresses only one particular hypothesis, namely introgression from one "donor" into one of two tested "recipients". It requires an outgroup that has not taken part in the introgression and assumes that the introgression is rare and not mutual. Based on our observations at the amylase locus, followed by closer inspection of the whole genome, we realized that neither D-statistics nor haplotype-based analyses are suitable, as mutual introgression seems to be a highly prevalent pattern in the mouse genome. Hence, we chose a phylogenetic discordance approach which is much broader and allows us not only testing how much introgression occurred into only one target group, but also how much is found in both and whether it happens across multiple outgroups. Note that D-statistic applications tend to filter out cases where two outgroups would yield the same introgression signal (e.g. Osborne et al. 2016; PMID: 26979797).
The use of consensus sequences is in fact another important point that is different from previous papers, as explained in the first paragraph of the results section. By taking the consensus of the majority alleles, we capture the mostly adaptive parts of the introgression (see simulations in Staubach et al. 2012). Therefore, we are not throwing away useful information, but consolidate it into a form where it reflects a population average of the major haplotypes. This is superior to the use of sequences from individuals (or inbred strains) only, that may harbor low frequency introgressed haplotypes that may not reflect the population average (which was our finding in Staubach et l. 2012).

(b) Yes, our approach is a test for phylogenetic discordance (including explicit trees), comparable to White et al. 2009 (PMID: 19936022). Although this paper is often cited as evidence for ILS, the authors concluded that the patterns they see are not well compatible with ILS. In fact, now where we discover more and more that introgression occurs often, ILS should not be the null model anymore (for many reasons - which could be discussed independently). But independent of this, the patterns that we are tracing here can in no way be explained by ILS, since they occur in the same windows across subspecies and species.

(c) In fact, we had the same "red flag" feeling when we saw these patterns first. For our initial analyses we had in fact used the standard strict filtering criteria for the mapping of reads. Interestingly, this has actually resulted in artificial discordance signals in additional CNV regions, mostly due to much reduced - and thus erratic - coverage. Removing the filter resulted in much better consolidated patterns, i.e. we find also CNV regions that show no signs of introgression. This includes also olfactory and vomeronasal receptor clusters - most show no introgression, only a subset shows the signal. Hence, it is not the CNV structure itself that causes the problem. To further consolidate this, we performed simulations of the actual tree topology and the actual mapping procedure. This showed that the mapping per se is not expected to lead to artifacts.
Further, we also have to take the outcome of the analysis into account and this does make biologically sense. Many immunity related genes (which are perfect candidates for adaptation) occur in gene families. By cutting CNVs out of our analysis we would have mostly missed this interesting finding. Interestingly, not all immune gene clusters appear to be equally affected. The T-cell receptor cluster on chromosome 16 shows at best a weak signal only (see chr14:51,727,421-54,372,465 in the public wildmouse-introgression track), confirming further that the CNV structure per se is not a problem in our analysis.
Concerning the note on cwc22: gene conversion from paralogs is of course an expected effect in gene families and should lead to concerted evolution - which is actually the opposite of phylogenetic discordance. Your paper does not really test introgression as alternative interpretation of the dicovered discordance, although an apparently introgressed allele is observed in M. spretus. This is not meant as critique on the paper - but it should not be held against our interpretation.
Concerning the note on the Y-chromosome loci: We had mentioned this only as initial observation, since we have not included the Y-chromosome in our overall analysis. We agree that the analysis provided in your pre-print is an alternative explanation for the shallow trees observed in this region - note that we find actually no phylogenetic discordance in these cases, only a shallow tree. But we should also not completely exclude the possibility of introgression into non-recombining parts of the Y-chromosome either. Sperm can absorb foreign DNA (see review Smith and Spadafora 2005; PMID:15832378) and given that there is frequent multiple mating in mice, it would not seem impossible that an occasional carry-over of Y-chromosomal DNA might occur. In fact, this might even be the mechanism that allows transfer of DNA between nominal species.