[R-sig-ME] How to extract and understand heterogeneous variances in mixed-effects models

Mon Mar 5 16:39:45 CET 2018

Hi all!

Being my first question to the R-sig-mixed-models mailing list I'm hoping that it is not off-topic.   

I'm fitting a model where I want to allow for heterogeneous variances over a grouping variable in the data. The purpose of the model is to estimate the stability of gene expression and I'm planning to do this using the method described in Serrano et al. 2011. They propose modeling gene expression as the dependent variable, gene ID and experimental grouping variables as fixed effects and the interaction between gene ID and grouping as a random effect. Additional random-effect terms are added to the model to estimate random variation due to e.g. each biological sample etc. The random effect of each gene x grouping is regarded as "bias", deviation from the gene x group average. Additionally, residual variance is estimated and the stability measure is called mean square error (MSE). MSE is calculated as: MSE = bias^2 + variance. The maximum MSE for each gene is regarded as the gene-specific stability measure. A more stable gene would in theory deviate less and show less variance.

Serrano et al. 2011 fits models "using the MIXED procedure of the SAS/STAT statistical package". When trying to implement the method in R I've resorted to nlme::lme function as we can fit models with heterogeneous variances specifying this in varIdent(form = ~1|someGroup).  

I've formulated a model with (what I believe are) crossed random effects and heterogeneous variances over a grouping variable. The random effects are treatment X gene interaction  (a vector made with interaction() containing treatment and gene ID specified as INTERACT below) and subject representing the ID of participants in the study. The weights-argument is specified using the varIdent() function where the grouping variable is the same INTERACT-variable as in the random-effects part. 

The model: 

m1 <- lme(expression ~ gene + treatment, 
           random = list(one1 = pdIdent(~ INTERACT -1),  
                         one2 = pdIdent(~ subject -1)),
           weights = varIdent(form = ~ 1|INTERACT), 
           data = data, method = "ML")

To replicate the method proposed by Serrano et al., the method is set to maximum likelihood.

Reading the literature, the estimated variance parameters from the model "represent the ratio between the standard deviations of
the lth stratum and the first stratum" (Pinheiro & Bates 2000, p. 209). So here comes my questions regarding varIdent (1) and the crossed random effects (2):

1. Am I right when retrieving the group specific variances as:
`m1$sigma^2 * coef(m1$modelStruct$varStruct, uncons = FALSE)^2`    
with the variance for the reference level being `m1$sigma^2 * 1`?

And a follow up question, how are one to understand the group specific variances? Are they "estimated" or "observed"? I guess my question boils down to: what assumptions are made when accepting strata-specific variances?

2. Are my crossed random effects correct (forgive me! a classic question, I know)? My understanding of the method in Gałecki & Burzykowski 2013 (p. 478) is that n variables with all values set to 1 are added to the data.frame

`data <- within(data, one1 <- one2 <- (...) <- one_n <- 1L)`

"crossed" random intercepts can then be added to the model by

`random = list(one1 = pdIdent(~ RANDOMFACTOR1 -1), one2 = pdIdent(~ RANDOMFACTOR2 -1))`

Am I right to understand that this is the analog random-effects structure to lmer's:
`fixed-effect + (1|RANDOMFACTOR1) + (1|RANDOMFACTOR2)

Thanks in advance for all feedback!
Best regards, Daniel

Pinheiro, J. & Bates, D. 2000 Mixed-Effects Models in S and S-PLUS, Springer-Verlag New York
Gałecki, A. & Burzykowski, T. 2013 Linear Mixed-Effects Models Using R, Springer-Verlag New York 
Serrano et al. 2011 Use of Maximum Likelihood-Mixed Models to select stable reference genes: a case of heat stress response in sheep. BMC Molecular Biology 2011 12:36. https://doi.org/10.1186/1471-2199-12-36