Preprint reviews by David Curtis

Bayesian Integrated Analysis Of Multiple Types Of Rare Variants To Infer Risk Genes For Schizophrenia And Other Neurodevelopmental Disorders

Hoang T. Nguyen, Amanda Dobbyn, Laura M. Huckins, Douglas Ruderfer, Gulio Genovese, Menachem Fromer, Xinyi Xu, Joseph Buxbaum, Christina Hultman, Pamela Sklar, Shaun M. Purcell, Xin He, Patrick F. Sullivan, Eli Ayumi Stahl

Review posted on 19th June 2017

Can I just check I am understanding your results correctly?

You estimate (page 11) that 8% of genes are "risk genes" for schizophrenia. Then, you estimate that the mean relative risk (RR) for a loss of function (LOF) de novo variant in any of these genes is 12.25? But the mean RR for a LOF variant in one of these genes in a case control sample is 2.09 in one cohort, 1.04 in another.

For me, this raises some questions. Firstly, what is the justification for treating de novo and case-controls separately when you are seeking to estimate the effect size of a LOF variant? Either way, the assumption is that the variant causes complete LOF of the gene. Why should it matter to the subject whether or not a parent also possessed the variant? Shouldn't TADA be trying to integrate information from different study types to produce a consistent picture? Likewise, why should the RR for a singleton LOF variant differ from a non-singleton LOF variant? Sure, in the model, the effect should be the same? As it stands it looks as though you might have singleton and non-singleton LOF variants in the same gene but they would be assumed to have a different effect on risk. Secondly, my recollection is that no controls possessed a SETD1A LOF variant. With a population risk of 1%, I think this would produce a RR of 100 (i.e. 1/0.01). Is a mean RR a meaningful concept given both that some genes will have a very high RR and conversely that there is a ceiling for the RR, which cannot exceed 100?

I am unsure of the effect of errors in genotype-calling on your results. For the de novo variants, false-positive calls far exceed true de novos and so one has to take special steps to validate them. It is unclear how genotyping errors, batch effects and population stratification might impact on these results, nor how sensitive are the results to the statistical models imposed.

I agree that there seems to be reasonable evidence for SETD1A and TAF13 as risk genes, although basically this evidence derives from the observation of very small numbers of de novo mutations.

However I feel less enthusiastic about the implication that there are 1500 genes for which a singleton LOF variant will have an average RR around 2. If this there were the case, might we not expect to see some gene sets in which there was overall a high RR for singleton LOF variants? Bearing in mind that there are over ten singleton LOF variants per gene, would it not be possible to identify, by one means or another, gene sets which were fairly strongly enriched for these risk genes? It would be good to see the RR presented for each gene set.

show less