[R] Efficiency of random and fixed effects estimator

Tue Aug 23 19:01:55 CEST 2011

Daniel Malter <daniel <at> umd.edu> writes:

> 
> id<-rep(c(1:100),each=2)
> obs<-rep(c(0:1),100)
> d<-rep(sample(c(-1,1),100,replace=T),each=2) 
> base.happy<-rep(rnorm(100),each=2)
> happy<-base.happy+1.5*d*obs+rnorm(200)
> 
> data<-data.frame(id,obs,d,happy)
> 
> > I am statistically confused tonight. When the assumptions to a random
> > effects estimator are warranted, random effects should be the more
> > efficient estimator than the fixed effects estimator because it uses fewer
> > degrees of freedom (estimating just the variance parameter of the normal
> > rather than using one df for each included fixed effect, I thought).
> > However, I don't find this to be the case in this simulated example.
> > 
> > For the sake of the example, assume you measure subjects' happiness before
> > exposing them to a happy or sad movie, and then you measure their
> > happiness again after watching the movie. Here, "id" marks the subject,
> > "obs" marks the pre- and post-treatment observations, "d" is the treatment
> > indicator (whether the subject watched the happy or sad movie),
> > "base.happy" is the ~N(0,1)-distributed individual effect a(i), happy is
> > the measured happiness for each subject pre- and post-treatment,
> > respectively, and the error term u(i,t) is also distributed ~N(0,1).
> > 

> > # Now run the random and fixed effects models
> library(lme4)
> reg.re<-lmer(happy~factor(obs)*factor(d)+(1|id))
> 
> reg.fe1<-lm(happy~factor(id)+factor(obs)*factor(d))
> summary(reg.fe1)
> 
> library(plm)
> reg.fe2<-plm(happy~factor(obs)*factor(d),index=c('id','obs'),
>   model="within",data=data)
> summary(reg.fe2)

> > 
> > I am confused why FE and RE models are virtually equally efficient in this
> > case. Can somebody lift my confusion?
> > 

  I have a couple of questions/suggestions:

(1) this would probably get more appropriate attention on the 
r-sig-mixed-models list (I'm not forwarding it there myself because
I'm answering via Gmane and I'm lazy).  [When/if you forward it,
do mention that you're cross-posting from R-help on purpose ...]

(2) Can you be more specific about your metric of efficiency in this
case?   Are you referring to the fact the estimates / standard errors
etc. are identical?

lm: 

                        Estimate Std. Error t value Pr(>|t|)    
factor(obs)1:factor(d)1  3.39763    0.25200  13.483  < 2e-16 ***

plm:

                        Estimate Std. Error t-value  Pr(>|t|)    
factor(obs)1:factor(d)1  3.39763    0.25200 13.4826 < 2.2e-16 ***

(3) I think in this case that the random-effects test,
when estimated by REML, is equivalent to a (pooled)
t-test on (happy(after)-happy(before)) between the
d=-1 and d=1 groups ... ?  Is that useful?

library(plyr)
diffdata <- ddply(data,"id",
                  function(x) data.frame(d=x$d[1],dhappy=diff(x$happy)))
with(diffdata,t.test(dhappy[d==1],dhappy[d==-1],var.equal=TRUE))

data:  dhappy[d == 1] and dhappy[d == -1] 
t = 13.4826, df = 98, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 3.897713 2.897540 
sample estimates:
mean of x mean of y 
1.784575 -1.613051

Note the same t-statistic here.

I think that, in this balanced case, there is no difference ...