--- title: "Spatial modeling using the sommer package" author: "Giovanny Covarrubias-Pazaran" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Spatial modeling using the sommer package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- The sommer package was developed to provide R users with a powerful and reliable multivariate mixed model solver for different genetic (in diploid and polyploid organisms) and non-genetic analyses. This package allows the user to estimate variance components in a mixed model with the advantages of specifying the variance-covariance structure of the random effects, specifying heterogeneous variances, and obtaining other parameters such as BLUPs, BLUEs, residuals, fitted values, variances for fixed and random effects, etc. The core algorithms of the package are coded in C++ using the Armadillo library to optimize dense matrix operations common in the derect-inversion algorithms. This vignette is focused on showing the capabilities of sommer to fit spatial models using the two dimensional splines models. **SECTION 1: Introduction ** 1) Background in tensor products **SECTION 2: Spatial models** 1) Two dimensional splines (multiple spatial components) 2) Two dimensional splines (single spatial component) 3) Spatial models in multiple trials at once ## SECTION 1: Introduction ### Backgrounds in tensor products TBD ## SECTION 2: Spatial models ### 1) Two dimensional splines (multiple spatial components) In this example we show how to obtain the same results than using the SpATS package. This is achieved by using the `spl2Db` function which is a wrapper of the `tpsmmb` function. ```{r} library(sommer) data(DT_yatesoats) DT <- DT_yatesoats DT$row <- as.numeric(as.character(DT$row)) DT$col <- as.numeric(as.character(DT$col)) DT$R <- as.factor(DT$row) DT$C <- as.factor(DT$col) # SPATS MODEL # m1.SpATS <- SpATS(response = "Y", # spatial = ~ PSANOVA(col, row, nseg = c(14,21), degree = 3, pord = 2), # genotype = "V", fixed = ~ 1, # random = ~ R + C, data = DT, # control = list(tolerance = 1e-04)) # # summary(m1.SpATS, which = "variances") # # Spatial analysis of trials with splines # # Response: Y # Genotypes (as fixed): V # Spatial: ~PSANOVA(col, row, nseg = c(14, 21), degree = 3, pord = 2) # Fixed: ~1 # Random: ~R + C # # # Number of observations: 72 # Number of missing data: 0 # Effective dimension: 17.09 # Deviance: 483.405 # # Variance components: # Variance SD log10(lambda) # R 1.277e+02 1.130e+01 0.49450 # C 2.673e-05 5.170e-03 7.17366 # f(col) 4.018e-15 6.339e-08 16.99668 # f(row) 2.291e-10 1.514e-05 12.24059 # f(col):row 1.025e-04 1.012e-02 6.59013 # col:f(row) 8.789e+01 9.375e+00 0.65674 # f(col):f(row) 8.036e-04 2.835e-02 5.69565 # # Residual 3.987e+02 1.997e+01 # SOMMER MODEL m1.sommer <- mmer(Y~1+V+ spl2Db(col,row, nsegments = c(14,21), degree = c(3,3), penaltyord = c(2,2), what = "base"), random = ~R+C+ spl2Db(col,row, nsegments = c(14,21), degree = c(3,3), penaltyord = c(2,2), what="bits"), data=DT, tolParConv = 1e-6, verbose = FALSE) summary(m1.sommer)$varcomp # get the fitted values for the spatial kernel and plot # ff <- fitted.mmer(m1.sommer) # DT$fit <- as.matrix(Reduce("+",ff$Zu[-c(1:2)])) # lattice::levelplot(fit~row*col,data=DT) ``` ### 2) Two dimensional splines (single spatial component) To reduce the computational burden of fitting multiple spatial kernels `sommer` provides a single spatial kernel method through the `spl2Da` function. This as will be shown, can produce similar results to the more flexible model. Use the one that fits better your needs. ```{r, fig.show='hold'} # SOMMER MODEL m2.sommer <- mmer(Y~1+V, random = ~R+C+spl2Da(col,row, nsegments = c(14,21), degree = c(3,3), penaltyord = c(2,2)), data=DT, tolParConv = 1e-6, verbose = FALSE) summary(m1.sommer)$varcomp # get the fitted values for the spatial kernel and plot # ff <- fitted.mmer(m2.sommer) # DT$fit <- as.matrix(Reduce("+",ff$Zu[-c(1:2)])) # lattice::levelplot(fit~row*col,data=DT) ``` ### 3) Spatial models in multiple trials at once Sometimes we want to fit heterogeneous variance components when e.g., have multiple trials or different locations. The spatial models can also be fitted that way using the `at.var` and `at.levels` arguments. The first argument expects a variable that will define the levels at which the variance components will be fitted. The second argument is a way for the user to specify the levels at which the spatial kernels should be fitted if the user doesn't want to fit it for all levels (e.g., trials or fields). ```{r, fig.show='hold'} DT2 <- rbind(DT,DT) DT2$Y <- DT2$Y + rnorm(length(DT2$Y)) DT2$trial <- c(rep("A",nrow(DT)),rep("B",nrow(DT))) head(DT2) # SOMMER MODEL m3.sommer <- mmer(Y~1+V, random = ~vsr(dsr(trial),R)+vsr(dsr(trial),C)+ spl2Da(col,row, nsegments = c(14,21), degree = c(3,3), penaltyord = c(2,2), at.var = trial), rcov = ~vsr(dsr(trial),units), data=DT2, tolParConv = 1e-6, verbose = FALSE) summary(m3.sommer)$varcomp # get the fitted values for the spatial kernel and plot # ff <- fitted.mmer(m3.sommer) # DT2$fit <- as.matrix(Reduce("+",ff$Zu[-c(1:4)])) # lattice::levelplot(fit~row*col|trial,data=DT2) ``` ## Literature Covarrubias-Pazaran G. 2016. Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6):1-15. Covarrubias-Pazaran G. 2018. Software update: Moving the R package sommer to multivariate mixed models for genome-assisted prediction. doi: https://doi.org/10.1101/354639 Bernardo Rex. 2010. Breeding for quantitative traits in plants. Second edition. Stemma Press. 390 pp. Gilmour et al. 1995. Average Information REML: An efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51(4):1440-1450. Henderson C.R. 1975. Best Linear Unbiased Estimation and Prediction under a Selection Model. Biometrics vol. 31(2):423-447. Kang et al. 2008. Efficient control of population structure in model organism association mapping. Genetics 178:1709-1723. Lee, D.-J., Durban, M., and Eilers, P.H.C. (2013). Efficient two-dimensional smoothing with P-spline ANOVA mixed models and nested bases. Computational Statistics and Data Analysis, 61, 22 - 37. Lee et al. 2015. MTG2: An efficient algorithm for multivariate linear mixed model analysis based on genomic information. Cold Spring Harbor. doi: http://dx.doi.org/10.1101/027201. Maier et al. 2015. Joint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder, and major depressive disorder. Am J Hum Genet; 96(2):283-294. Rodriguez-Alvarez, Maria Xose, et al. Correcting for spatial heterogeneity in plant breeding experiments with P-splines. Spatial Statistics 23 (2018): 52-71. Searle. 1993. Applying the EM algorithm to calculating ML and REML estimates of variance components. Paper invited for the 1993 American Statistical Association Meeting, San Francisco. Yu et al. 2006. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Genetics 38:203-208. Tunnicliffe W. 1989. On the use of marginal likelihood in time series model estimation. JRSS 51(1):15-27.