Open preprint reviews by Benjamin Schwessinger

Comparative analysis highlights variable genome content of wheat rusts and divergence of the mating loci.

Christina A Cuomo, Guus Bakkeren, Hala Badr Khalil, Vinay Panwar, David Joly, Rob Linning, Sharadha Sakthikumar, Xiao Song, Xian Adiconis, Lin Fan, Jonathan M Goldberg, Joshua Z Levin, Sarah Young, Qiandong Zeng, Yehoshua Anikster, Myron Bruce, Meinan Wang, Chuntao Yin, Brent McCallum, Les J Szabo, Scot Hulbert, Xiaming Chen, John P. Fellers

I overall enjoyed reading the manuscript at hand "Comparative analysis

highlights variable genome content of wheat rusts and divergence of the mating
loci." by Cuomo et al.. I am especially excited by the mating type analysis of
wheat rust fungi, something I started doing myself for Puccinia striformiis f.
sp. tritici. There is a clear lack of understanding mating types in rust fungi
and their role during infection and for generating genetic diversity.
I am a little bit disappointed by the lack of information in the Material and
Methods (MM) section. Especially information about how genome analysis was
performed is lacking. Indeed I thought I would be able to learn something for my
own project but this was not the case. I encourage the authors to explain their
analysis in more detail and provide scripts for reproducibility/teaching reasons.

Line numbers would be really helpful during the review process.

Please find below a specific comments on the manuscript.
Major changes (++) Minor changes (+)

Nicely summarizes the main finding of the manuscript

page 4: "When biologically stressed, the fungus enters the sexual cycle and
survival teliospore structures are produced" This needs a reference. I am not
sure this holds true for the center of origin such as the Himalayan region where
the sexual host is continuously present and rust are reproducing sexually all
the time.
"These complex interactions result in the production of up to five different
rust spore types, requiring very discrete developmental programs, resulting in
altered gene expression profiles." Needs reference.
Page 5:
"Genetic studies by crossing individual strains is not trivial due to the
difficulty of breaking teliospore dormancy in order to infect the alternate
hosts." Author should consider recent work by Rodriguez-Algaba J et al. 2014
"Wheat leaf rust, caused by Puccinia triticina Eriks (Pt), is the most commonly
occurring cereal rust disease worldwide." Needs a reference.
-> why are teliospores called a "survival structure"?
-> The authors are clear experts in fungal mating types. Providing an
illustration of the different mating type proteins pheromone, pheromone receptor
and homedomain-containing transcription factors would aid less familiar readers
in the field.

page 10: "The assembly of Pst totaled 117.31 Mb; this is comparable to
previously reported values (Cantu et al. 2011; Zheng et al. 2013)." This
statement is not correct. The assemblies in Cantu et al. 2011 and 2013 are only
describing contigs and the total length of these assemblies is around 50-70Mb.
-> I wonder how the high number of scaffold N's, esp. for Pst, influences the
interpretation of relative repeat content in the three genomes. My understanding
is that N's are most likely caused by repeats that are unable to be assembled by
short read sequencing technologies. Please comment.
-> the authors should consider including information provided in Cantu et al.
2013 when talking about analysis of Pst.
The part assessing assembly quality in regards to heterzygousity could be
explained better. It starts with "Regions of high heterozygosity could carry
enough differences to prevent haploid assembly and could inflate the gene count
for such regions, as alleles would appear as duplicated genes."
-> I encourage to the authors to be more careful with the following statement
"Overall this suggests that independent assembly of both haplotypes is minimal
in all three wheat rust pathogens, as expected given the choice of assembly
strategies that take". It might well be that conserved genes are less
heterozygous in general and this could confound the authors analysis. Indeed the
authors were able to identify two alleles for their mating type HD genes which
might well indicate independent assembly of both haplotypes. The authors
actually suggest this interpretation. In addition simply present absence
polymorphism between the two haplomes will also confound this analysis. Please
-> the nomenclature referring to candidate secreted effector proteins (CSEPs) is
not consistent from page 14-17. Sometimes they are referred to as effectors.
Please be consistent.
-> Nice to see expression data on the alternate host incorporated in this
-> The mating-type analysis would benefit from identifying SNPs in STEs and mfa
sequences. Are these loci heterozygous or homozygous. The later would indicated
that genes might be only present in one haplome and not the other.
-> "Pt HD genes are functional in U. maydis" might be an over statement. One
HD-domain protein can substitute for one U. maydis ortholog but when expressing
both rust orthologs in a double knockout (eg. Uh553 (a1 b0)) the mating -type
could not be rescued.
-> "Pt mating-type genes are functional during wheat infection" section needs a
control showing that mating type genes are actually targeted in within Pt (e.g.

"Notably, we find that Pst has the highest level of heterozygosity and that this
measure is larger than previously reported (Zheng et al. 2013). While some of
this difference could be attributed to the isolate sequenced, the much larger
size of the Pst-130 genome used in this previous study may result in an
under-estimation of heterozygosity, such as in cases where both alleles of a
gene were assembled independently." Reference Zheng et al. 2013 refers to CY32.
The genome size of assembly Pst-130 is actually only 65Mb. Please correct statement.
-> I really enjoyed the discussion.

MM referring to page 11 to page 16 could be improved. In particular the
following. ++
-> ortholog analysis and synteny analysis are totally missing.
-> heterozygosity analysis could be improved by providing details on setting for
BWA. It is unclear how genic and intergenic SNP rates were calculated.
-> The part assessing assembly quality in regards to heterzygousity could be
explained better. No detail is provided on this part in the MM section.
-> any MM referring to "Core protein comparisons and orthology" is missing.
-> it were great if the author were to provide their CSEPs annotation pipeline
and scripts.
-> providing alignment for Figure 4 phylo tree would be good.

show less