[R] Efficiency of random and fixed effects estimator
Ben Bolker
bbolker at gmail.com
Tue Aug 23 19:01:55 CEST 2011
Daniel Malter <daniel <at> umd.edu> writes:
>
> id<-rep(c(1:100),each=2)
> obs<-rep(c(0:1),100)
> d<-rep(sample(c(-1,1),100,replace=T),each=2)
> base.happy<-rep(rnorm(100),each=2)
> happy<-base.happy+1.5*d*obs+rnorm(200)
>
> data<-data.frame(id,obs,d,happy)
>
> > I am statistically confused tonight. When the assumptions to a random
> > effects estimator are warranted, random effects should be the more
> > efficient estimator than the fixed effects estimator because it uses fewer
> > degrees of freedom (estimating just the variance parameter of the normal
> > rather than using one df for each included fixed effect, I thought).
> > However, I don't find this to be the case in this simulated example.
> >
> > For the sake of the example, assume you measure subjects' happiness before
> > exposing them to a happy or sad movie, and then you measure their
> > happiness again after watching the movie. Here, "id" marks the subject,
> > "obs" marks the pre- and post-treatment observations, "d" is the treatment
> > indicator (whether the subject watched the happy or sad movie),
> > "base.happy" is the ~N(0,1)-distributed individual effect a(i), happy is
> > the measured happiness for each subject pre- and post-treatment,
> > respectively, and the error term u(i,t) is also distributed ~N(0,1).
> >
> > # Now run the random and fixed effects models
> library(lme4)
> reg.re<-lmer(happy~factor(obs)*factor(d)+(1|id))
>
> reg.fe1<-lm(happy~factor(id)+factor(obs)*factor(d))
> summary(reg.fe1)
>
> library(plm)
> reg.fe2<-plm(happy~factor(obs)*factor(d),index=c('id','obs'),
> model="within",data=data)
> summary(reg.fe2)
> >
> > I am confused why FE and RE models are virtually equally efficient in this
> > case. Can somebody lift my confusion?
> >
I have a couple of questions/suggestions:
(1) this would probably get more appropriate attention on the
r-sig-mixed-models list (I'm not forwarding it there myself because
I'm answering via Gmane and I'm lazy). [When/if you forward it,
do mention that you're cross-posting from R-help on purpose ...]
(2) Can you be more specific about your metric of efficiency in this
case? Are you referring to the fact the estimates / standard errors
etc. are identical?
lm:
Estimate Std. Error t value Pr(>|t|)
factor(obs)1:factor(d)1 3.39763 0.25200 13.483 < 2e-16 ***
plm:
Estimate Std. Error t-value Pr(>|t|)
factor(obs)1:factor(d)1 3.39763 0.25200 13.4826 < 2.2e-16 ***
(3) I think in this case that the random-effects test,
when estimated by REML, is equivalent to a (pooled)
t-test on (happy(after)-happy(before)) between the
d=-1 and d=1 groups ... ? Is that useful?
library(plyr)
diffdata <- ddply(data,"id",
function(x) data.frame(d=x$d[1],dhappy=diff(x$happy)))
with(diffdata,t.test(dhappy[d==1],dhappy[d==-1],var.equal=TRUE))
data: dhappy[d == 1] and dhappy[d == -1]
t = 13.4826, df = 98, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
3.897713 2.897540
sample estimates:
mean of x mean of y
1.784575 -1.613051
Note the same t-statistic here.
I think that, in this balanced case, there is no difference ...
More information about the R-help
mailing list