Open preprint reviews by Peter Ellis

Sequence and structural diversity of mouse Y chromosomes

Andrew Parker Morgan, Fernando Pardo Manuel de Villena

What an absolutely fascinating paper.

I have a few questions and comments - some of them likely quite naive as statistical genetics is not my area!


Lines 167-176:

You counted gene copies directly and confirmed the earlier finding that musculus has much higher copy numbers of Slx and Sly relative to domesticus (Fig. 5). However you also showed that the proportion of the red/blue/yellow amplicons is the same in each species (Fig. S3A), and that the domesticus Yq is if anything slightly larger on average that musculus Yq (Fig. S3B).

How can these observations be squared with each other? If domesticus Yq is larger than musculus Yq, and they both have same proportion of the red amplicon containing Sly - how can musculus have a larger copy number of Sly? I'm not sure what these different measurements are telling us.


Lines 184-190:
You looked for a signal of selection at the co-amplified loci on the X, but observed no reduction in genetic diversity surrounding the X ampliconic regions.

Given that one mode (most likely mode?) of expansion of these clusters is by nonallelic homologous recombination, does this affect the calculation? It seems plausible that it would, since the same effective mutation - expansion of the cluster - can occur recurrently on different haplotypes and also spread horizontally between haplotypes by recombination within the gene cluster. This doesn't apply to males, and so the males would have a much greater reduction in diversity associated with selection on the amplicons.


Lines 194-199:
You find that for X and autosomal genes, there is more variation between tissues and less between species, compared to the Y chromosome. e.g. PC1 and PC2 for X+A genes represent tissue specificity whereas PC2 for Y genes represents species differences.

To what extent is this due to Y genes being almost exclusively testis specific? When genes are expressed in multiple tissues, there's room for a lot of complexity, which will show up in the PCA analysis. When genes are expressed in only one tissue, one principle component is sufficient to encapsulate that fact, and so PC2 will necessarily relate to something else.
What happens if you compare Y-linked genes to testis-specific (or spermatid-specific) genes on the X and autosomes? Does the Y still show up as having increased expression divergence between species? I suspect it will, but it would be nice to check.


Lines 217 onwards:
Yes, there's definitely more complexity here and it's not just a linear function of Slx:Sly ratio. In our original paper (Cocquet et al 2012) and the preceding shSLX knockdown paper, we found that knocking down Slx on its own didn't really affect X gene expression as a whole, although there was a sex ratio skew. It may be that there are thresholding effects - e.g. as long as you have "enough" Sly around to prevent Slx from accessing chromatin, then adding more Sly beyond that point won't affect X gene expression any more.

Deficiency in the multicopy Sycp3-like X-linked genes Slx and Slxl1 causes major defects in spermatid differentiation.
Cocquet J, Ellis PJ, Yamauchi Y, Riel JM, Karacs TP, Rattigan A, Ojarikre OA, Affara NA, Ward MA, Burgoyne PS.
Mol Biol Cell. 2010 Oct 15;21(20):3497-505.

Julie has also recently shown that SSTY proteins interact with all the Slx/Slxl1/Sly family and may affect their ability to enter the nucleus - so not only do Slx/Slxl1/Sly likely compete for binding to particular chromatin sites, they may also compete for some factor that transports them into the nucleus.

SSTY proteins co-localize with the post-meiotic sex chromatin and interact with regulators of its expression.
Comptour A, Moretti C, Serrentino ME, Auer J, Ialy-Radio C, Ward MA, Touré A, Vaiman D, Cocquet J.
FEBS J. 2014 Mar;281(6):1571-84. doi: 10.1111/febs.12724.


Lines 248-251 and Figure 7B:
What is the gene copy number in the lines used for the DXD and MXM crosses? Given that you've now documented extensive variability within as well as between species, I think it would be useful to include this information in the figure.

What was the denominator for the Slx/y family here? Did you count both Slx and Slxl1, or just Slx? Does the interpretation change if you use just one or the other? It's not clear to me that we yet know whether Sly is directly competing with Slx, Slxl1 or both.

It might also be interesting to normalise the activity for the gene copy number in each case. Slx and Sly have a fundamental difference in that (if the underlying hypothesis is true that Slx promotes expression from the sex chromosomes and Sly represses it), Slx has a positive feedback on itself, while Sly has a negative feedback on itself.

Thus a comparatively small change in Slx copy number could have a disproportionately large effect, while a large change in Sly copy number will be "buffered" by the negative feedback. In the 2/3 Yq deletion mice, expression of Yq genes drops by less than 50%, since each copy is transcribed at an intrinsically higher level. I suspect this may be a contributory factor to the sheer size of the Yq amplicons - a small amplification on the X triggers a much greater degree of amplification on the Y, because the Y has to fight through the fog of its own negative feedback.


Lines 308-310
Do you mean to say that you cannot detect a signature of sex ratio skewing, or that you can definitively rule out sex ratio skewing? In a conflict scenario such as the one hypothesised, then the historical situation could well be one of constant change - sometimes the X has the upper hand and the sex ratio is female biased, sometimes the other way round. Would that not obscure the signature of any particular episode of skewing?


Lines 370-371
You say, "Sex-ratio distortion has been observed in the offspring of males with X:Y copy-number mismatch in some experiments (Cocquet et al., 2009; Case et al., 2015) but not in others (Turner et al., 2012; Albrechtová et al., 2012)."

The Turner paper did show some effects in the predicted direction. From their Table 2:
Domesticus offspring = 31/59 = 52.5% female
Hybrid domesticus (with domesticus Y ) = 58/105 = 55.2% female
Hybrid musculus (with musculus Y) = 88/206 = 42.7% female
Musculus = 42/76 = 55.3% female

With low numbers in each group, these differences are not all significant (power calculation for a 10% skew requires 400 in each group for 80% power), but I certainly don't think it can be ruled out, particularly since they didn't explicitly break down the groups based on the proportion of the X chromosome coming from the introgressed background in each case, i.e. their hybrid groups may include animals where only autosomal loci have ingressed and the sex chromosomes are congruent. Indeed, their own conclusion was that "the trend in our data is consistent with a sex ratio distorter on the musculus Y which is effective only on a partially domesticus background"

The Albrechtova paper made no measurements of sex ratio and I'm not sure why you're citing it here. They looked at sperm counts and sperm velocity in an area with an introgressed Y which is already known to affect sex ratio, and found that,

"In the section of the HMHZ we studied, the YMUS chromosome has introgressed across the zone in apparent disregard of Haldane's rule and this introgression is associated with a shift in the sex ratio in favour of males [6]. In the current study, we find that in the presence of the invading Y chromosome the most extreme reduction of SC in hybrid individuals is more than rescued, to the extent that an apparently domesticus male with the introgressed YMUS chromosome is expected to have higher SC than one with its consubspecific Y."

i.e. introgression of the musculus Y is favoured because it rescues adverse sperm phenotypes in hybrid males. Their reference 6 is to the following paper, which is relevant and should be cited in your paper.

Macholán M., Baird S. J. E., Munclinger P., Dufková P., Bímová B., Piálek J. 2008. Genetic conflict outweighs heterogametic incompatibility in the mouse hybrid zone? BMC Evol. Biol. 8, 271–284


Lines 378-383
Here, there are two independent deletions of ~2/3 of Yq that should be cited - the one from Conway et al that you already have, which arose on an RIII background, and also one from Josefa Styrna that arose on a B10.BR background. The paper from Macholán et al is also probably best mentioned here as a "real world" example of sex ratio alteration associated with Y chromosome introgression.

Influence of partial deletion of the Y chromosome on mouse sperm phenotype.
Styrna J, Klag J, Moriwaki K.
J Reprod Fertil. 1991 May;92(1):187-95.

Regarding the paper by Fischer et al on C57Bl/6JBomTac, all they say in the paper is that there are no reports of sperm abnormalities or sex ratio skewing in this line. So far as I'm aware, nobody's looked yet, so this is certainly worth checking. I don't think we can assume anything from the current absence of evidence, though.

Another comment that occurs to me at this point - when you were looking at the "co-amplified" X genes for signatures of selection, how did you define co-amplification? If I am reading the paper right, it looks like you looked specifically at the direct homologues of the Y-linked ampliconic genes, i.e. Sstx, Slx, Slxl1 and Srsx.

In looking for signatures of selection around X-linked genes, I think it is imperative to first consider which X genes are likely to be affected by the conflict. The Slx/Sly conflict seems to be mediated by varying the strength of PSCR, i.e. a GLOBAL regulation of sex chromosome expression in spermatids. The prediction therefore is that if Sly-mediated repression increases, EVERY dosage-sensitive, spermatid-expressed gene on the X and Y will come under selection to increase its activity.

This is what we saw in our 2011 paper - the proliferation of Slx and Sly in the Palaearctic clade is associated with an increase in copy number at almost all the X-linked ampliconic genes, not just the direct homologues of the Y-linked ampliconic genes. We also showed that the net transcription level of the X amplicons stayed approximately constant across species despite an increase in copy number. We interpreted this as showing that the X linked genes are being selected to maintain functionality despite increasing postmeiotic repression.

In your data we would therefore predict a signature of selection not just at the specific homologues of Y-linked ampliconic genes, but at many of the other X-ampliconic genes. This would confound attempts to detect selection by comparing the X-Y homologous genes to the rest of the chromosome.

Similarly, a selective signature from the conflict may not be restricted to ampliconic genes. All we can predict is that as Sly repression increases, X- and Y-linked genes are forced to respond _in some way_. That does not only mean gene amplification. Any given gene could respond by an increase in copy number (more copies) - but it could also respond with an increase in promoter strength (more transcripts per copy), improved translation efficiency (more protein per mRNA molecule) or an increase in protein function (more functional activity per protein molecule).

For example, there is a single Zfy gene in rat. In mouse this has become duplicated to give Zfy1 and Zfy2 (gene copy number change), Zfy2 has acquired a new spermatid-specific promoter (increased transcription from one gene copy), and Zfy2 has additionally become a stronger transcriptional activator (increased function per mRNA transcript). I can't prove (yet) that this is linked to the Slx/Sly conflict, but it looks to me like it may be.

Whatever the form of response, if it was driven by selection, it should in principle leave some signature around many of the spermatid-expressed genes on the X. How does the analysis in figure S4 change if you look at the DNA surrounding all the spermatid-expressed genes on the X? Given that there are rather a lot of them(!) it may be that they all run into one and you won't be able to find a specific loss of diversity around each gene, just a loss of diversity across the X as a whole.

If you do try this, you may need to treat spermatid-specific genes separately from genes expressed more widely. Widely-expressed genes will be constrained by the fact that increasing their activity in spermatids may also increase their expression in other cell types, however spermatid-specific genes will be freer to respond to the conflict. I think this is what's going on in Larson et al 2016a when they report that some genes show transcriptional alteration in pre-meiotic spermatogonia in the different species and F1 hybrids. I think what may have happened here is that some widely-expressed X-linked genes have been selected for stronger promoter activity to overcome Sly-mediated repression in spermatids. This keeps overall transcription reasonably constant in spermatids, but now leaves them overdosed in the spermatogonia.

And finally (!)
The potential selective signature of the Slx/Sly conflict may not be restricted to the sex chromosomes - there are also a few ampliconic autosomal loci that appear to be regulated by Slx and Sly. These include Speer genes (Cocquet et al 2009, 2012) and a block of genes on chromosome 14 (Larson et al 2016a, fig 4C). It might be that a look at these areas would show something interesting. Possibly it would even be easier to see a signature of selection here, since so far as I'm aware these are discrete blocks of genes rather than chromosome-wide regulatory effects.

I know I said "and finally". I lied, sorry!

Hopefully really final question: What is known structurally about the X chromosome in the different species? The mouse X is rearranged relative to other mammals, and even relative to rat. That's pretty unusual in mammals! See for example figure 6 from the rat genome paper.

If multiple genes on the X come under selection simultaneously, I can see inversions getting fixed if they lock together particularly well-adapted sets of genes. Same argument for why X and Y chromosomes diverge, except this time it would be competing X haplotypes. That too would suppress diversity across the entire chromosome rather than giving a gene-specific signature.

show less

Thanatotranscriptome: genes actively expressed after organismal death

Alexander E Pozhitkov, Rafik Neme, Tomislav Domazet-Loso, Brian Leroux, Shivani Soni, Diethard Tautz, Peter Anthony Noble

"We see this pattern in many of the transcriptional profiles. This is difficult to explain by the 'stable gene transcript' idea."

This is incorrect - it is trivial to explain almost any pattern in terms of differential cell death and differential transcript stability. To illustrate this, consider the following toy model system:

Cell composition

In the model, the tissue you are analysing has two cell types in equal proportions - muscle cells and fibroblasts. We will assume that muscle cells have a high requirement for oxygen and will die within minutes of organismal death. Fibroblasts are more resilient and will live for a week. Transcription continues until the cell dies, after which point the mRNA begins to decay.

Gene expression per cell
In the model, we assume that each cell type expresses just three genes, at equal absolute levels.
1) An unstable housekeeping gene (U in both cell types) where the transcript takes 1 day to decay.
2) A stable housekeeping gene (S in both cell types) where the transcript takes two days to decay.
3) A highly stable tissue specific gene (M for muscle and F for fibroblasts) where the transcript takes 3 days to decay.

What happens in the experiment?
Both cell types are alive at the instant of organismal death.
- Muscle cells contain 1 unit of M, 1 unit of U and 1 unit of S
- Fibroblasts contain 1 unit of F, 1 unit of U and 1 unit of S

You measure:
M = 16.7% of the total sample
F = 16.7% of the total sample
U = 33% of the total sample
S = 33% of the total sample

The muscle cells now die, and the transcripts within those cells start to decay.

***DAY 1
Transcripts in dead muscle cells are decaying with rates as specified above.
Fibroblasts are still alive
- Dead muscle cells contain 2/3 unit of M, 0 units of U and 1/2 units of S
- Fibroblasts contain 1 unit of F, 1 unit of U and 1 unit of S

You measure:
M = 16.3% of the total sample = DOWNREGULATED FROM D0
F = 24.2% of the total sample = UPREGULATED FROM D0
U = 24.2% of the total sample = DOWNREGULATED FROM D0
S = 36.3% of the total sample = UPREGULATED FROM D0

***DAY 2
Transcripts in dead muscle cells are decaying with rates as specified above.
Fibroblasts are still alive
- Dead muscle cells contain 1/3 unit of M, 0 units of U and 0 units of S
- Fibroblasts contain 1 unit of F, 1 unit of U and 1 unit of S

You measure:
M = 10% of the total sample = DOWNREGULATED FROM D1
F = 30% of the total sample = UPREGULATED FROM D1
U = 30% of the total sample = UPREGULATED FROM D1
S = 30% of the total sample = DOWNREGULATED FROM D1

RNA in Dead muscle cells has completely vanished
Fibroblasts are still alive
- Dead muscle cells contain nothing
- Fibroblasts contain 1 unit of F, 1 unit of U and 1 unit of S

You measure:
M = 0% of the total sample = DOWNRGULATED FROM D2, ABSENT HEREAFTER
F = 33% of the total sample = UPREGULATED FROM D2, THEN PLATEAU
U = 33% of the total sample = UPREGULATED FROM D2, THEN PLATEAU
S = 33% of the total sample = UPREGULATED FROM D2, THEN PLATEAU

On day 7 the fibroblasts die

***DAY 8
All cells now dead, fibroblast mRNAs now decaying
- Fibroblasts contain 2/3 unit of F, 0 units of U and 1/2 unit of S

You measure:
U = 0% of the total sample = DOWNREGULATED FROM D7, ABSENT HEREAFTER
F = 57% of the total sample = UPREGULATED FROM D7
S = 44% of the total sample = UPREGULATED FROM D7

***DAY 9
All cells now dead, fibroblast mRNAs almost completely decayed
- Fibroblasts contain 1/3 unit of F, 0 units of U and 0 units of S

You measure:
F = 100% of the total sample = UPREGULATED FROM D8
S = 0% of the total sample = DOWNREGULATED FROM D8

So, what are the final profiles?
* U shows an initial dip, then a slow rise, then a plateau, then a sudden fall to zero.
* S fluctuates, showing an initial rise, then a dip, then another rise, then finally disappearing.
* M shows a slow fall which accelerates down to zero
* F increases in a jerky non-linear fashion to very high levels, just before it vanishes.

And note - this model exhibits all that complexity using just four genes in two cell types. I have done a lot of expression profiling on mixed populations (my area is testis development), and I assure you these results are virtually uninterpretable without a good fundamental understanding of the detailed cellular composition of your sample at each stage.

"Reply: We are confused by the comment about how we report RNA concentrations. "

Your table S2 purports to show a consistent increase in the amount of mRNA in the liver from time zero through to 48 hours. In my experience the precipitation step in your RNA extraction is not quantitative in the first place. Even if the extraction itself is quantitatively accurate, it may be simply that you processed a larger chunk of liver for the 48 day samples than for the time zero samples.

You *have* to give your figures as amount of RNA per mg of tissue input, rather than per ul of lysate. Even then, it is very likely that as interstitial tissue fluids drain / pool / evaporate following death, the same weight of input tissue will contain more actual cells.

"Reply: We do not see this as a confounding factor. It occurs."

OK, so let's say you have a liver biopsy at time zero which is full of blood. Very high levels of globin mRNA from all the red blood cells, plus immunoglobulin genes from the B and T cells, etc. Hepatocyte mRNA is only a fraction of the total.

Now, take a sample a couple of hours later after much of the blood has drained out. Your globin mRNAs and immune-genes go down, because you no longer have as many red or white blood cells in the sample. All the hepatocyte genes go up, because there is now a higher proportion of these cells in the sample.


You will measure this as potentially quite dramatic changes in gene abundance, which you are in turn interpreting as transcriptional regulation. All it is really telling you is that (1) Mammals have blood, and (2) gravity exists.

You absolutely have to know what you are putting into your experiment, and what your RNA sample was actually made from!

To sum up: there are two principles that should be applied to the interpretation of ALL transcriptomic studies looking at whole tissue extracts.


1) You are profiling a mixed collection of multiple cell types. If the cellular composition of the tissue changes between the samples, then this will cause changes in transcript abundance that have nothing to do with genuine transcriptional regulation.

Therefore, if the cellular composition of the tissue of interest changes between your samples, it is VERY HARD to distinguish between genuine transcriptional regulation, and uninteresting passive consequences of the change in cell numbers. To have even a chance of doing so so, you have to either
(a) quantitate the cellular makeup of each sample and adjust for this, or
(b) isolate specific cell types - e.g. by flow sorting - and profile them individually

(a) would be easiest to do histologically, but in some cases you can recover elements of this data from the final RNA profile. For example, if you know that genes A & B are only expressed in one cell type, while genes D and E are only expressed in a different cell type, then you can use these genes as "tracers" to measure the abundance of each cell type. Conversely changes in the expression level of A relative to B, or C relative to D represent genuine transcriptional changes within each cell type.


2) Almost all RNA quantitation methods measure the level of mRNA in the cytoplasm. This reflects the balance between historic transcription levels and the rate of mRNA degradation, and is poorly correlated with active transcription. Demonstrating de novo transcription is also a VERY HARD problem.

The gold standard method is RNA FISH, which directly detects the nascent transcripts within the nucleus on a per-cell basis. This technique however is very taxing, hard to compare between studies, and not amenable to global transcription profiling.

A possible approach would be to label newly-synthesised mRNA - e.g. by incorporation of ethynyluridine (EU) - and then purify the newly-synthesised mRNA for expression profiling. There is a commercial kit available for this, but I don't know if it has been applied to whole tissue samples or just to cell lines.


" ...specific gene transcripts cannot be stable at one time, unstable the next time, and then stable again [...] One can only conclude that the mRNA is being synthesized."

As covered above - the multiple levels of confounding factors in this analysis can easily mimic this pattern, or any other pattern you choose to name. At the risk of repeating myself:

1) Highly stable transcripts will show an artifactual peak after the cell dies, because the unstable transcripts vanish and only the stable transcripts remain.

2) If there is more than one cell type present, and these die at different times, then you will get more than one artifactual peak in the profile for the stable transcripts.

In the spirit of usefulness, here are some references that may be helpful.
These two demonstrate that individual cells can survive for a very long time after the organism dies. There are many more such papers, going back to the 70s and earlier.
These two look at differential survival of various cell types - one study focusing on blood, and another on the inner ear. The latter study showed that stem cells survive better than terminally differentiated cells. This is biologically unsurprising, as stem cells in an adult organism are in general maintained in quiescent "tick-over" mode. This is a possible explanation for the observed upregulation of Hox genes and other early embryonic genes: if the stem cells are the last ones to die, then the global transcriptome will become more "stem-like".
This one looked at the rate of mRNA degradation in two specific cell types (chondrocytes from arthritis sufferers versus control), and showed that different transcripts degrade at different rates, and this varies between cell types.
This one looks at mRNA decay rates in mouse brain and liver after post mortem (you might want to compare their liver data to yours if the time points match). Importantly, they found that none of the upregulated genes could be validated by quantitative PCR, while the downregulated genes did validate.

show less