[R] variance estimates in lme biased?

Peter Dalgaard p.dalgaard at biostat.ku.dk
Wed Dec 17 17:19:15 CET 2003


Thomas Lumley <tlumley at u.washington.edu> writes:

> On Tue, 16 Dec 2003, Gary Allison wrote:
> 
> > Hi all,
> > I didn't get a response to my post of this issue a week ago, so I've
> > tried to clarify:
> >
> > When I use lme to analyze a model of nested random effects, the variance
> > estimates of levels higher in the hierarchy appear to have much more
> > variance than they should.
> >
> > In the example below with 4 levels, I simulate variance in level 2
> > (sd=1.0) and level 4 (sd=0.1), but levels 1 and 3 do not vary.  Although
> > I expected to see a small amount of variability in the lme estimates of
> > levels 1 and 3, I am confused by what happens in the level 1 estimates:
> >   more than 10% of the runs produce stdDev of >0.4 and in many cases the
> > level 1 estimate is close to level 2's. Nothing like that happens in
> > level 3.
> >
> 
> While I haven't seen any literature on this precise problem I would not be
> particularly surprised by it.  Your model says that there is a lot of
> variability at level two but that it all averages out to zero within a
> level 1 unit.  It would not be particularly surprising if some of the
> level 2 variability leaked up to level one, since it is only being
> averaged over 5 level 2 units per level 1 unit.
> 
> Given sufficient data this will eventually stop happening, but you don't
> have very many level 1 replicates either.
> 
> I think that if you really need to estimate a small variance component
> with large variances nested within it, you need a lot more data or a
> prior.

Or, try looking at a smaller example where things can be worked out
explicitly: One-way ANOVA with random btw.group variation. Say 5
groups and 3 obs per group. If I got this right (please do check!),
the estimate of the between-group variance is 1/3 times the difference
between two chi^2/f distributed variables with 4 and 10 DF
respectively. This will become negative about half the time, and lme
(and similar code) will set it to zero in that case. Now

> sd.sim <- sqrt(pmax(1/3*(rchisq(1000,4)/4 - rchisq(1000,10)/10),0))
> summary(sd.sim)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 0.0000  0.0000  0.0000  0.2017  0.3955  1.2160

does not seem to be too far from what Gary has been experiencing. 

Obviously, the fact that the estimator is censored at zero will make
it biased, but an extended estimator (allowing negative values) is
unbiased. 

> var.sim <- 1/3*(rchisq(1000,4)/4 - rchisq(1000,10)/10)
> summary(var.sim)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
-0.796100 -0.196900 -0.042720 -0.007667  0.138500  1.104000



-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907




More information about the R-help mailing list