Completed on 9 Nov 2014 by Ana I Vazquez.

License: http://creativecommons.org/licenses/by/4.0/

Login to endorse this review.

Author response is in blue.

Determination of Nonlinear Genetic Architecture using Compressed Sensing

- Chiu Man Ho, Stephen D.H. Hsu

This paper introduces the method of compressive sensing in order to reconstruct nonlinear interactions in a genomic context (gene-by-gene interactions, gene-by-environment interactions and epistasis) in complex traits. The authors propose their method (which is a penalized regression using LASSO penalization) as a way to estimate linear and non-linear effects of a large set (snp variants) in which the effects (linear and non-linear) are sparse. The article introduces several very interesting contexts into the genetic context, which are not only worth but also novel. Suggestions to improve the paper follow,

Major Compulsory Revisions: 1) A review of prior literature published in this domain is largely missing. Considerable work has been done with regard to the use of penalized regression in genetic analysis of complex traits. For modeling gene-by-gene and gene-by-environment interactions, a variety of approaches including penalized likelihood, hierarchical Bayesian models, Bayesian partitioning algorithms, non-parametric Bayesian methods and machine learning algorithms have already been applied. There is a large body of work pertaining to the analysis of genetic data using penalized regression-based approaches such as the LASSO. The efficiency of optimization algorithms for the LASSO such as least angle regression and co-ordinate descent has also been discussed. Some papers for a review of existing methods to model genetic interactions include Yi (2010), McKinney et al. (2006), and Park and Hastie (2006), and Tibshirani (1996). 2) The methodology and steps followed are blurry, and the notation is not consistent and always introduced. Also please clarify if models used to simulate genotypes and effects on the synthetic genotype are also used in the pseudo-simulation where real genotypes (from 1,000 genome projects) are used (eq. (2.1) and (2.2) also used to generate effects here?).

Minor Essential Revisions
3) Eq 4.2 I think it should be y^a = sum_i(g^a_i x_i +e*) where e* is a different residual than e (on eq. 4.1) the effects here (x_i are the allele substitution effects (See Falconer)
4) It is not clear what X^ is, please clarify, thus Eq 2.4 will be also more clear.
5) In the equation of the discussion (h^2_broad sense), typically H^2 is the symbol for broad sense heritability, and here should be clarified that var(y)=1 (or substituted in the equation, i.e. (var(y)-var(e)) /var(y)= (var(L)+var(NL))/var(y)
6) One of the principal points of the paper by Hill et al. (2008) is that an additive model captures non-additive interactions as well. The authors, while citing this paper, seem to have bypassed this point (page 1, line 5).
7) The term A in equations 1.1 and 1.2 is changed to g later. Although a notification corresponding to this change is provided (page 2 line 1), it seems unnecessary to use two separate notations.
8) A classical notation in genomics uses X, x, Z, z for incidence matrices, and Greek letters for indicating effects. We suggest the authors to adapt the manuscript to the classical notation.
9) One of the interesting features of the paper; namely the n > Cslogp inequality (page 2, line 10) is simply stated without providing any citations or extending. For example it is unclear how the constant C is obtained.
10) One of the cases in the simulation considers 3 QTL with 2 of them undergoing interaction (page 4, line 6). This is a very small number of QTL that is unrealistic with most complex traits and the authors could consider dropping this.
11) It is unclear if the lambda parameter (page 4, line 18) corresponds to the interaction effects alone. Kindly clarify.
12) The paper by Tanck et al. (2006) applied LASSO penalization with two separate penalties: one for the main effect and one for the interaction effect. Similarly, models by Bogdan et al. (2004), Baierl et al. (2006), and Manichaikul et al. (2009), also used two separate penalties. It would be interesting to see how the proposed method compares in terms of predictive power to similar methods already published.
13) The “universality class” of random matrices (page 2, line 19 and page 5, line 15) has not been explained. It will be helpful to elaborate what bearing this has on predictive power.
14) The content displayed in the figures needs to be explained much more clearly. The conclusions arrived at from figures 5 and 6 about how the universality class of the compressed sensor g does not get altered in real and synthetic genomes (page 5, lines 10-15) needs to be better clarified and justified.
15) Examples of when the g matrix would comprise continuous instead of discrete entries (page 6, lines 1-2) would help illustrate this point better.

Discretionary Revisions Figures are too many and lack of explanation, reduce to a very few well presented to make stronger points of the behavior of your model.

References

1) Yi, Nengjun. "Statistical analysis of genetic interactions." Genetics research 92.5-6 (2010): 443-459. 2) McKinney, Brett A., et al. "Machine learning for detecting gene-gene interactions." Applied bioinformatics 5.2 (2006): 77-88. 3) Park, Mee Young, and Trevor Hastie. Regularization path algorithms for detecting gene interactions. Department of Statistics, Stanford University, 2006. 4) Tibshirani R. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. Series B. 1996; 58:267–288. 5) W. Hill, M. Goddard, P. Visscher, Data and Theory Point to Mainly Additive Genetic Variance for Complex Traits, PLoS Genet. Feb 2008; 4(2): e1000008. 6) Tanck M, Jukema J, Zwinderman A. Simultaneous estimation of gene-gene and gene-environment interactions for numerous loci using double penalized log-likelihood. Genet Epidemiol.2006; 30:645–51. 7) Bogdan M, Ghosh J, Doerge R. Modifying the Schwarz Bayesian information criterion to locate multiple interacting quantitative trait loci. Genetics. 2004; 167:989–99. 8) Baierl A, Bogdan M, Frommlet F, Futschik A. On locating multiple interacting quantitative trait loci in intercross designs. Genetics. 2006; 173:1693–703 9) Manichaikul A, Moon J, Sen S, Yandell B, Broman K. A model selection approach for the identification of quantitative trait loci in experimental crosses, allowing epistasis. Genetics.2009; 181:1077–86.

Level of interest An article of outstanding merit and interest in its field Quality of written English Acceptable Statistical review Yes, and I have assessed the statistics in my report. Declaration of competing interests Dr. Hsu is Vice President of Research at Michigan State University. I have recently accepted a position in that university. Otherwise I declare no other competing interests.

Authors' response to reviews: (http://www.gigasciencejournal.com/imedia/1055861928166366_comment.pdf)