Review for "Population genomics of the Anthropocene: urbanization is negatively associated with genome-wide variation in white-footed mouse populations"

Completed on 11 Feb 2016 by Andrew Whitehead . Sourced from http://biorxiv.org/content/early/2015/11/26/025007.

Login to endorse this review.


Comments to author

The discussion for the Ecological Genomics (ECL 243) class on January 25th, 2016 focused on two papers exploring the effects of urbanization on white-footed mouse Peromyscus leucopus genomics: 1) Munshi-South et al. 2015. Population genomics of the Anthropocene: urbanization is negatively associated with genome-wide variation in white-footed mouse populations, and 2) Harris et al. 2015. Urbanization shapes the demographic history of a city-dwelling native rodent.

Both studies used the same genomic dataset from mice collected from 23 sites located on the urban-rural gradient within and around New York City. The sites were chosen based on qualitative features and categorized as urban, suburban, or rural. At each site, they aimed to collect tissue from 10 mice to use for genomic analysis. The migration distance of the mice was taken into account for establishing distance between sites. The authors used a within site sampling design to minimize capturing relatives. Our class concluded that their efforts to take the biology of mice into account for their sampling design was appropriate and well thought out. However, we were concerned that the way they classified locations as rural, suburban, or urban was not well defined, and had the potential to bias their results. We discuss this in more detail below.

The authors sequenced the tissue samples using a ddRAD protocol, which is an alternative to past methods used by this research group (e.g. microsatellites). We discussed the nuts and bolts of ddRAD sequencing, and its efficacy compared to the authors’ past methods. ddRAD sequencing utilizes restriction enzymes to cut DNA fragments at thousands of sites across the genome, and produces sequence tags are around 100 base pairs long. It provides many more loci for analysis and more complete sampling of the genome than microsatellite markers. This gives ddRAD more power to detect fine scale population structure. RADseq methods also tend to be superior to microsatellite methods for demographic history modelling because of the greater amount of data, but also because microsatellite evolution and mutation rates can be drastically different from genome-wide averages. We concluded the ddRAD methods in the paper were appropriate and superior to the authors’ past methods.

Our class wondered if the restriction sites targeted with ddRAD sequencing are randomly distributed across the genome. While ddRAD offers a more comprehensive sampling of the genome that microsatellites, ddRAD is not without biases. For example, RAD-seq methods can be biased towards euchromatin because restriction enzymes are more efficient at cutting euchromatin sites than heterochromatin sites. This could potentially inflate the number of RAD tags derived from coding and regulatory regions, which could reduce the amount of observed variation compared to a scenario where the genome is more neutrally sampled. However, the benefits of using ddRAD probably far outweighed the disadvantages for this study. We concluded that ddRAD allowed the authors to explore the genome more comprehensively than microsatellites would have allowed, and they were able to sequence a larger sample size than they probably could have using more expensive genomic techniques.

Our class analyzed the data filtering and SNP calling methods employed in the paper. The authors used stringent and conservative filters for data analysis and SNP calling. They used Stacks to de novo align RAD loci to each other because the white footed mouse does not have a reference genome (some of us learned that new world mice are very far removed from old world mice), and required a stack depth of 7 in their catalog assembly. We noted that the optimal stack depth for catalog assembly is highly dependent on average sequencing depth of the samples. The authors didn’t report their average sequencing depth, so it is difficult to evaluate if a stack depth of 7 was appropriate.

After catalog assembly, the authors filtered out loci that didn’t occur in at least 22 out of their 23 populations. This seems very stringent, and we would have been interested to hear justification for why they chose this particular threshold. They went from 880,898 RAD loci passing their quality and depth requirements, to only 14,930 loci after requiring presence in 22 out of 23 populations. Why not 21 or 20 populations? Lowering this threshold even a little may have substantially increased the number of RAD loci used for analysis without causing significant bias. They also required that if a locus was present in a population, it was present in at least 50% of the individuals sampled from that population. This seems appropriate. After calling SNPs, they only used one from each RAD locus. Our class concluded that this is a very reasonable step to take since SNPs from a single RAD locus are very likely in linkage disequilibrium.

We also concluded it was appropriate for them to remove individuals with very low numbers of reads and remove one individual from each pair of related individuals. They were left with 191 out of 233 individuals. Earlier in the paper, they stated their attempt to collect 10 individuals from each sampling site. After removing low-quality or related samples, they were left with fewer than 10 samples from each site. While it probably says somewhere in the supplemental material how many samples came from each site, it would be nice to have a range of per site sampling sizes stated in the methods section. For their subsequent analysis, having 7 or 8 individuals from all sites would be quite a bit different than having 10 individuals from some sites and only 2 or 3 individuals from others.

It was brought to our attention that other readers of this paper took issue with the authors’ description of data filtering. The language led the readers to conclude the filtering methods would result in the loss of many low-frequency alleles, but the authors’ actual intent was to explain how much missing data they were accommodating. After carefully reviewing the filtering steps described in both papers, we were unable to determine where the authors may have been trying to explain their accommodation of missing data. We agree that the stringent filtering methods and the low number of individuals from each site could underestimate low frequency allele prevalence. This is an important issue because loss of low frequency allele data can bias results, especially in the case of demographic modeling such as that performed in Harris et al. We suggest the authors reword some of the passages describing their filtering steps if their intent was indeed to explain their accommodation of missing data.

Munshi-South et al. 2015 hypothesized that genetic diversity within white-footed mouse populations would decrease along a rural to urban gradient, and an isolation by environment model would explain genetic differentiation between populations better than an isolation by distance model. They generally found the results they expected to, but our class had a few questions and concerns regarding their methods.

At the beginning of the paper, the authors defined their sampling sites as rural, suburban, or urban. However, their explanation of those classifications is minimal. Figure 1b plots the human population size of each site versus its percent impervious surface cover. Most urban locations have large human population sizes as well as large percentages of impervious surface cover. However, there are some obvious exceptions. The WW site is classified as rural, but has both higher human population size and higher percent impervious surface cover than the JB site, which is classified as urban. Similarly, some suburban sites are higher in both categories than some urban sites, while others are lower in both categories than some rural sites. Most of the modeling in the paper incorporates percent impervious surface cover and/or human population size rather than the urban/suburban/rural classification. However, the results section reports the population genomic statistics with respect to the urban/suburban/rural classification and the authors draw conclusions from this.

When compared to the quantitative measurements, we concluded the classification system seems self-contradictory and suspect. In the discussion, the authors note such classifications are common in urban ecology studies but are not standardized, and they call for future studies to report percent impervious surface cover. However, they themselves draw conclusions from the classification system. The authors probably had good reasons for using both the classification system and the more quantitative human population size/percent impervious surface measurements, but those reasons don’t come across clearly in the paper and the effect is confusing.

We also had some concerns about their treatment and identification of outliers. The authors identified four outliers from the population genomic statistics analysis. These outliers include two rural sites with low genetic diversity, one suburban site with low genetic diversity, and one urban site with high genetic diversity. These sites represent more than 1/6 of the sampling locations, which seems like a high percentage. To their credit, they continue using those data points in their modeling analysis. However, they drop the outliers in their report of diversity statistic ranges, and fail to address why those populations may differ in their diversity from other similar populations. Since the outliers represented a substantial portion of the data points, we feel it would have been appropriate to discuss some hypotheses that could explain why those populations differed from the others and why they shouldn’t be included in the report of diversity statistics.

We approved of their acknowledgement of their mixed results regarding their isolation-by-environment and isolation-by-resistance analysis. However, they suggested the urban populations’ deviance from global allele frequencies was a probable explanation for why BEDASSLE didn’t didn’t confirm the results of the partial Mantel test that indicated impervious surface is associated with genetic differentiation. They stated in their methods that the beta-binomial model in BEDASSLE should correct for this, and it is unclear why they didn’t trust the results of the beta-binomial BEDASSLE model.

Harris et al. 2015

found differentiation in the mice populations congruent with the geologic split between Long Island and the mainland following end-of-ice-age level rise, and subsequent differentiation congruent with the time that urbanization began in New York (~400 years ago). The class found the results compelling, but we questioned some of the methodology used in the study.

Our major criticism of the paper is that the main conclusions are based on divergence dates without a justified estimate of mutation rates for white-footed mice (estimation of divergence times requires an estimate of the mutation rate). Mutation rates can vary by species, population, and through time, and commonly accepted methods of estimating mutation rates (e.g. from substitution rate estimates) have biases that can affect divergence time inferences (see Obbard et al. 2012). Thus, the particular mutation rate used directly influences the estimate of divergence time of two populations. Because of this, divergence time estimates should be calculated using a biologically relevant mutation rate, and statistical packages used to calculate divergence times (including fastsimcoal2, used in this paper) require defining the mutation rate. However, in this paper, the methods don’t specify how an estimated mutation rate was chosen for the model input. It may have been outside the scope or means of this study to experimentally determine the mutation rate of Peromyscus leucopus, but since the mutation rate can have such a significant impact on the results, discussion and justification of an estimated mutation rate should have been included. Given the assumption that the mutation rate estimate was not specific to the white-footed mouse, a sensitivity analysis could be helpful to show how robust the results and conclusions are.

Given the assumption that their estimated mutation rate is within a small range of the true mutation rate, we agree it makes sense that strong differentiation would be associated with the geologic split between Long Island and the mainland. However, some classmates were skeptical of the conclusion that the onset of urbanization 400 years ago would impose a strong differentiation on the populations. It was pointed out that urbanization 400 years ago looked very different than what urbanization looks like today. Should we expect such a strong signal of differentiation when urbanization in the city first began, or should we expect it to occur after more modern urbanization was in place?

Another point of ambiguity surrounded the STACKS pipeline, which was used for data analysis. Some people pointed out that the STACKS pipeline is sometimes criticized for filtering out a large amount of important information that could be important during demographic modeling. We wondered how their choice of pipeline/analysis affected their results.

Overall, we found the two papers to be interesting demonstrations of the utility of genomic inference in studying past and present populations of native organisms. We are eager to see how follow-up studies in this system, and studies in other systems, reveal anthropogenic effects on genomic differentiation of organisms.