--- title: "Genotype plus Genotype-By-Block-of-Environments Biplots" author: "Kevin Wright" date: "`r Sys.Date()`" bibliography: gge.bib output: rmarkdown::html_vignette: vignette: > %\VignetteIndexEntry{Genotype plus Genotype-By-Block-of-Environments Biplots} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- # Abstract A short note about GGE and GGB biplots. # R setup ```{r setup, message=FALSE} library("knitr") knitr::opts_chunk$set(fig.align="center", fig.width=6, fig.height=6) options(width=90) ``` An example of a GGE (genotype plus genotype-by-environment) biplot similar to figure 12 of @yan2006biplot. The `flip` argument can be used to flip the x and y axes so that biplots are oriented as desired. Because the SVD factorization is not unique, ```{r gge} library(agridat) data(yan.winterwheat) dat1 <- yan.winterwheat library(gge) m1 <- gge(dat1, yield~gen*env, scale=FALSE) biplot(m1, main="yan.winterwheat - GGE biplot", flip=c(1,0), origin=0, hull=TRUE) ``` Many people prefer to use 'standardized' biplots, in which the data for each environment has been centered and scaled. For standardized biplots, a unit circle is drawn. Environment vectors that reach out to the unit circle are perfectly represented in the two dimensional plane. ```{r} m2 <- gge(dat1, yield~gen*env, scale=TRUE) biplot(m2, main="yan.winterwheat - GGE biplot", flip=c(1,1), origin=0) ``` As seen above, the environment vectors are fairly long, so that relative performance of genotypes in environments can be assessed with reasonable accuracy. In contrast, a biplot based on principal components 2 and 3 has shorter vectors which should not be interpreted. ```{r} biplot(m2, main="yan.winterwheat - GGE biplot - PC 2 & 3", comps=c(2,3), flip=c(1,1), origin=0) ``` @laffont2007numerical showed how to partition the sums-of-squares simultaneously along the principal component axes and along 'G' and 'GxE' axes. ```{r mosaic} plot(m1, main="yan.winterwheat") ``` The mosaic plot above shows that the first principal component axis is capturing almost all of the variation between genotypes, so that a projection of the genotype markers onto the first principal component axis is a good overall representation of the rankings of the genotypes. @laffont2013genotype presented GGB (genotype plus genotype-by-block of environments) biplots, which are useful to enhance the view of mega-environments consisting of multiple locations. ```{r ggb} library(agridat) data(crossa.wheat) dat2 <- crossa.wheat # Define mega-environment groups of locations dat2$eg <- ifelse(is.element(dat2$loc, c("KN","NB","PA","BJ","IL","TC", "JM","PI","AS","ID","SC","SS", "SJ","MS","MG","MM")), "Grp1", "Grp2") library(gge) # Specify env.group as column in data frame m3 <- gge(dat2, yield~gen*loc, env.group=eg, scale=FALSE) biplot(m3, main="crossa.wheat - GGB biplot") ``` # How to modify the "focus" of a biplot Let `X` be a genotype-by-environment matrix. Let the Singular Value Decomposition be `X = USV'`. Let the NIPALS decomposition be `X = TLP'`. DANGER, some algorithms do not factor L out of T. ```{r , eval=FALSE} library(agridat) library(reshape2) library(nipals) dat3 <- agridat::yan.winterwheat dat3 <- acast(dat3, gen~env, value.var="yield") dat3 <- scale(dat3, center=TRUE, scale=FALSE) Xsvd <- svd(dat3) Xnip <- nipals(dat3, center=FALSE, scale=FALSE) U <- Xsvd$u S <- diag(Xsvd$d) V <- Xsvd$v T <- Xnip$scores Lam <- diag(Xnip$eig) P <- Xnip$loadings ``` ## Biplots with genotype focus To obtain a *genotype-focused* biplot the eigenvalues are associated with `U`. The genotype coordinates are can be obtained from the SVD using the first two columns of `U*S` or equivalently from NIPALS `T*Lam`. ```{r, eval=FALSE} (U %*% S)[ , 1:2] (T %*% Lam)[ , 1:2] ``` The environment coordinates are the first two columns of `V` (from the SVD) or `P` (from NIPALS). ```{r , eval=FALSE} V[ , 1:2] P[ , 1:2] ``` ## Biplot with environment focus To obtain an *environment-focused* biplot the eigenvalues are associated with `V`. The genotype coordinates are the first two columns of `U` (from SVD) or `T` (from NIPALS). ```{r , eval=FALSE} U[ , 1:2] T[ , 1:2] ``` The environment coordinates are `S*V` (from SVD) or `Lam*P` (from NIPALS). ```{r , eval=FALSE} (S %*% V)[ , 1:2] (Lam %*% P)[ , 1:2] ``` ## Comments on biplots Note that GGE biplots are environment-focused. In particular, this provides the interpretation that the correlation of genotype performance in two environments is approximated by the cosine of the angle between the vectors for those two environments. The SVD and NIPALS methods provide the same principal components for complete-data, except that a principal component from SVD and the corresponding principal component from NIPALS might point in opposite directions (differ by a factor of `-1` as in some of the examples above. The corresponding biplots would therefore be mirror-reversed along that component. For biplots from SVD and NIPALS that are visually consistent, each principal component can be directed to point in a direction that is positively correlated with the overall genotype means. In other words, if the correlation of the genotype means and the ordinate of the genotypes along the principal component is negative, the principal component is multiplied by `-1`. As with all biplots, the environment vectors can be arbitrarily scaled so that the genotypes and environments uses a similar amount of area on the plot. The algorithm that physically centers the biplot and scales it on the page is not perfect and has opportunities for improvement. ## References