[R-sig-ME] Random effects in R vs SAS

Fri Aug 18 10:01:29 CEST 2023

Will, an immediate comment is that what may be accommodated are
variance ‘components'.  They are not the variances of any identifiable
quantity. It is surely misleading to call them variances.  For residuals,
the component is at the same time a variance.  With a suitably balanced
experimental design, one can work with the function aov() to obtain a
breakdown from which the negative component can be extracted,

The following inflates within block variances — it mirrors a context
where blocks have been designed to be more heterogenous than
plots generally.

> set.seed(29)
   library(lme4)
   df0 <- data.frame(block=rep(1:100, rep(10,100)), trt=rep(1:5,200),
                  y=rnorm(1000,4,1))
  ## Plot variances are all set to 1.0
  df0$y <- df0$y + df0$trt*0.1  ## Differences between treatments
  ## Add systematic within block variation
  ## 10 different systematic effects for the 10 different plots in
  ## a block are randomly assigned  
  systeff <- 0.4*(1:10)
  df <- df0
  for(i in 1:100)df$y[df$block==i] <- df$y[df$block==i]+sample(systeff)
  ## sample does the random assignment
  df$block = factor(df$block); df$trt =factor(df$trt)
  y.aov <- aov(y ~ trt +Error(block), data=df)

> summary(y.aov)

Error: block
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 99  127.7    1.29               

Error: Within
           Df Sum Sq Mean Sq F value Pr(>F)
trt         4   23.5   5.865   2.302 0.0569
Residuals 896 2282.8   2.548      

> y.lmer <- lmer(y~trt + (1|block), data=df)
boundary (singular) fit: see help('isSingular’)

> summary(y.lmer)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ trt + (1 | block)
   Data: df

REML criterion at convergence: 3776.3

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.78060 -0.72441  0.04553  0.70139  2.79109 

Random effects:
 Groups   Name        Variance Std.Dev.
 block    (Intercept) 0.000    0.000   
 Residual             2.537    1.593   
Number of obs: 1000, groups:  block, 100

. . .

> library(nlme)
> df.lme <- lme(y~trt, random=~1|block, data=df)
> ## No complaints!  This is a worry.

> summary(df.lme)
Linear mixed-effects model fit by REML
  Data: df 
       AIC      BIC    logLik
  3790.336 3824.655 -1888.168

Random effects:
 Formula: ~1 | block
         (Intercept) Residual
StdDev: 6.727467e-05 1.592659

Fixed effects:  y ~ trt 
               Value Std.Error  DF  t-value p-value
(Intercept) 6.096161 0.1126180 896 54.13131  0.0000
trt2        0.305667 0.1592659 896  1.91922  0.0553
trt3        0.418814 0.1592659 896  2.62965  0.0087
trt4        0.515445 0.1592659 896  3.23638  0.0013
trt5        0.561525 0.1592659 896  3.52571  0.0004

. . .

In principle, one should be able to set up a within blocks correlation
structure that accommodates the negative component of variance.
John Maindonald
Statistics Reseach Associates
Wellington NZ.

> On 18/08/2023, at 17:06, Will Hopkins <willthekiwi using gmail.com> wrote:
> 
> I am a SAS user, and I have just finished a near-final draft of an article
> about identifying and specifying fixed and random effects for mixed models
> in SAS. At the moment, all I say about R is this: "To all users of the R
> stats package, my apologies: a few years ago, I found the mixed model in R
> too limiting, and I struggled with the coding. I hope someone with R smarts
> will consider adapting my programs and publishing them here." I really would
> like to know if the limitations (or at least, what I consider to be
> limitations) have been addressed, so I can be more specific in my article.
> Specifically can anyone answer these questions?
> 
> 
> 
> 1. Can you specify negative variance for random effects in R? (That doesn't
> apply to the variances representing residuals, which are never negative).
> 
> 
> 
> 2. Can you get trustworthy estimates of standard errors for the
> random-effect and residual variances in R? (By trustworthy, I guess I mean
> the same as SAS's, but it could mean that someone has shown that the
> confidence intervals derived with the SEs have good coverage, assuming
> normality for the variances representing random effects and chi-squared for
> variances representing residuals.) 
> 
> 
> 
> 3. Does R have the equivalent of the repeated statement in SAS, whereby you
> can specify a repeated measure with an unstructured or other covariance
> matrix? (e.g., repeated Time/subject=SubjectID type=un. This statement works
> more reliably than the random statement in SAS with small sample sizes, but
> it doesn't produce residual variances, not directly anyway.)
> 
> 
> 
> 4. Does R have the equivalent of the group= option in SAS, whereby you can
> specify separate estimates of random-effect variances and covariances,
> and/or separate estimates of residuals, for different groups? (e.g., random
> SubjectID/group=Sex; repeated/group=Sex;)
> 
> 
> 
> The draft article is available at
> https://sportsci.org/2023/EffectsModels.htm. It's not published yet (i.e.,
> not linked to the Sportscience homepage yet). I would be grateful for any
> feedback. I can incorporate more about R and would love someone to provide R
> code for the programs I have published. See below for the title and
> abstract.
> 
> 
> 
> Will
> 
> 
> 
> How to Identify and Specify Fixed and Random Effects for Linear Mixed Models
> in the Statistical Analysis System
> 
> 
> 
> Most analyses require linear mixed modeling to properly account not only for
> mean effects but also for sources of variability and error in the data. In
> linear mixed models, means and their differences are specified with fixed
> effects, while variabilities and errors are specified with random effects
> (including residuals) and are summarized as standard deviations. In this
> tutorial article, I explain how variables represent effects in linear mixed
> models and how identifying clusters of observations in a dataset is the key
> to identifying the fixed and random effects. I also provide programs written
> in the language of the Statistical analysis system to simulate data and
> analyze them with the general linear mixed model, Proc Mixed, which is used
> when the dependent variable is continuous. The analyses include simple
> linear regression, reliability and time series, controlled trials, and
> combined within- and between-subject modeling. Finally, I explain how
> dependent variables representing counts or proportions require generalized
> linear mixed models, realized in SAS with Proc Glimmix, where the main
> difference is the specification of the residuals.
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

	[[alternative HTML version deleted]]