[R] Behaviour of standard error estimates in lmRob and the like

Fri Jan 11 17:03:34 CET 2008

I am looking at MM-estimates for some interlab comparison work. The
usual situation in this particular context is a modest number of results
from very expensive methods with abnormally well-characterised
performance, so for once we have good "variance" estimates (which can
differ substantially for good reason) from most labs. But there remains
room for human error or unexpected chemistry which still causes some
outliers. MM-estimates from robust regression with variance weights is a
suggested approach with promising features. It's a sledgehammer approach
to getting a mean value, but ....

Now, rlm, lmrob and lmRob all seem to do the job . But one of the
things of particular interest to this particular lab community is the
standard error on that mean value, and there I am seeing some unexpected
anomalies in lmRob. rlm, by contrast, is pretty much rock steady, and
lmrob little different to rlm. 

The practical answer is not to use lmRob, but I'd like to better
understand the cause. (It's a confidence thing). Clearly, the scale
estimator in lmRob is doing something odd - but what? 

Any observations/pointers much appreciated.

By way of example, try (for exaggerated behaviour):

library(robust)   #for lmRob
library(robustbase)  #lmrob - R's own version?
library(MASS)   #for rlm
#The presence of, or ordering, of the three packages does not affect
the outcome from lmRob

N<-10
z<-matrix(rnorm(N*1000), ncol=N)
se.MM.rlm<-apply(z[,1:N],1, function(z) coef(summary(rlm(z~1,
method="MM")))[2] )
se.MM<-apply(z[,1:N],1, function(z) coef(summary(lmRob(z~1,
control=lmRob.control(efficiency=0.95)) ))[2] )
se.MM.lmrob<-apply(z[,1:N],1, function(z) coef(summary(lmrob(z~1)) )[2]
)
         #.. and some simpler estimators for comparison, since this is
a simple data set.
se.med<-apply(z[,1:N],1, function(x,N) mad(x)/sqrt(N),N=N )
se.mean<-apply(z[,1:N],1, function(x,N) sd(x)/sqrt(N),N=N )
se.H<-apply(z[,1:N],1, function(z) hubers(z,k=1.35)$s/sqrt(N) )
#expected 95% efficiency

se.vals<-data.frame(Mean=se.mean, Median=se.med, Huber=se.H,
MM.rlm=se.MM.rlm, MM.lmrob=se.MM.lmrob, MM=se.MM)
se.stk<-stack(se.vals)

windows()
boxplot(values~ind, data=se.stk, main=paste("STD Errors: N=",N),
ylab="Std Err")
abline(h=sqrt(1/N))   #Around about .3 for N=10
abline(h=sqrt(1/0.95)*sqrt(1/N),lty=2) #95% efficiency line
#Now, I don't mind a difference, but a factor of 200 is a bit much to
swallow.

#Steve Ellison

*******************************************************************
This email contains information which may be confidentia...{{dropped:5}}