[R] Robust Standard Errors for lme object

Doran, Harold HDoran at air.org
Tue Aug 7 18:11:21 CEST 2007


> -----Original Message-----
> From: Thomas Lumley [mailto:tlumley at u.washington.edu] 
> Sent: Tuesday, August 07, 2007 11:06 AM
> To: Doran, Harold
> Cc: Lucy Radford; r-help at stat.math.ethz.ch
> Subject: Re: [R] Robust Standard Errors for lme object
> 
> On Tue, 7 Aug 2007, Doran, Harold wrote:
> 
> > Lucy:
> >
> > Why are you interested in robust standard errors from lme? 
> Typically, 
> > robust standard errors are sought when there is model 
> misspecification 
> > due to ignoring some covariance among units with a group.
> >
> > But, a mixed model is designed to directly account for covariances 
> > among units within a group such that the standard errors more 
> > adequately represent the true sampling variance of the parameters.
> 
> 
> I think it's a perfectly reasonable thing to want, but it is 
> not easy to provide because of the generality of lme().  For 
> example, models with crossed effects need special handling, 
> and possibly so do time-series or spatial covariance models 
> with large numbers of observations per group.

I still don't understand why we might want them. If there were some
correlation among units within a group, then we would not have indep.
observations. That is, we have less information from a cluster sample
than from a simple random sample because each individual does not
provide unique information. Hence, if one were to use an OLS regression
when there is clustering, then the likelihood function is misspecified.
Now the MLEs from this regression may still retain some pragmatic
interest (they *may* still be consistent), but the sampling variances of
the fixed effects would be incorrect (most likely too small). So, one
may obtain robust standard errors to account for model misspecification
in this scenario.

On the other hand, if the researcher knows that there may be a violation
of independence because of clustering, then one may be motivated to rely
on a mixed model in which case the likelihood is not misspecified since
the posterior views the observations as conditionally ind given the
population distribution. Therefore, the sampling variances of the fixed
effects are correct. Why get robust standard errors when the likelihood
function is correctly specified?

To make this concrete, assume it were an educational testing example
where each student is tested and students are grouped into classrooms.
Hence we have N total students clustered within K groups. Instruction
happens at the group level and students within a classroom are
presumably not providing independent information.

Now, if I were to do an OLS regression with test scores as the outcome
and some variable x as the ind. variable, this model would be
misspecified given the clustering and the standard errors would not be
correct. But, if I were to use lmer and accounted for this clustering,
then the standard errors of the variable x would be accurate. Why would
I go and get robust standard errors in this case?

Another way to put it is this. The OLS standard errors are

Se = (X'V^{-1}X){-1}

Where V is a diagonal matrix and all elements along the diagonal are
equal. This same equation is used for the standard errors in a mixed
model, but V is now a block-diagonal matrix where the off-diagonals are
the covariances among individuals within the same group. Doug Bates will
kill me for using GLS notation since it doesn't jive with lmer.


> 
> I imagine that misspecification of the variance, rather than 
> the correlation, would be the main concern, just as it is 
> with independent observations. Of course the model-robust 
> variances would only be useful if the sample size of 
> top-level units was large enough, and if the variance 
> components were not of any direct interest.
> 
> 
> > So, the lme standard errors are robust in a sense that they are 
> > presumably correct if you have your model correctly specified.
> 
> To paraphrase the Hitchikers' Guide: This must be some 
> definition of the word 'robust' that I was not previously aware of. :)
> 
> 
>       -thomas
> 
> Thomas Lumley			Assoc. Professor, Biostatistics
> tlumley at u.washington.edu	University of Washington, Seattle
> 
> 
>



More information about the R-help mailing list