[R-sig-ME] Efficiency of random and fixed effects estimators

John Maindonald john.maindonald at anu.edu.au
Wed Aug 24 11:07:28 CEST 2011


In the following, reg.fe1 is parameterised a bit differently, to make its output 
more directly comparable with that from the random effects model reg.re1
(also I have reordered the terms so that the aliased parameter is that 
associated with a level (as it happens, the 99th) of id)

reg.fe1<-lm(happy~factor(obs)*factor(d)+factor(id))
summary(reg.fe1)

reg.fe1<-lm(happy~factor(obs)*factor(d)+C(factor(id),contr.sum))
summary(reg.fe1)

Coefficients: (1 not defined because of singularities)
                           Estimate Std. Error t value Pr(>|t|)
(Intercept)                 0.01593    0.69578   0.023 0.981781
factor(obs)1               -1.63826    0.20978  -7.809 6.46e-12
factor(d)1                  0.10152    1.20867   0.084 0.933233
. . . .
factor(obs)1:factor(d)1     3.11316    0.27787  11.204  < 2e-16

                        Estimate Std. Error t value
(Intercept)              0.11537    0.22373   0.516
factor(obs)1            -1.63826    0.20978  -7.809
factor(d)1              -0.07294    0.29634  -0.246
factor(obs)1:factor(d)1  3.11316    0.27787  11.204

The parameters factor(obs)1 and factor(obs)1:factor(d)1 are estimated in the stratum 
of individual observations (2 observations per id). Note that factor(obs)1:factor(d)1 
is the difference of level 2 of obs from level 1 of obs at the second level of d.
Hence the random effects analysis gives estimates that are exactly the same, with 
the same SEs, as for the fixed effects analysis.  Degrees of freedom are also exactly
the same.

The parameter factor(d) contrasts individuals (2 observations each) exposed to one
type of movie with those exposed to the other type.  Variation between individuals
now comes into play.  The SE (really an SED) for the random effects analysis is now 
very substantially greater than the (seriously biased downwards) SE for the fixed
effects analysis.  The random variation added in base.happy<-rep(rnorm(100),each=2)
now comes into play, in addition to the random variation associated with individual
observations, added in happy<-base.happy+1.5*d*obs+rnorm(200). 

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 23/08/2011, at 2:06 PM, Daniel Malter wrote:

> Hi all, 
> 
> I am statistically confused tonight. When the assumptions to a random effects estimator are warranted, the random effects estimator should be the more efficient estimator than the fixed effects estimator because it uses fewer degrees of freedom (estimating just the variance parameter of the normal rather than using one df for each included fixed effect). Consequently, the standard errors of the coefficients should be smaller in random than in fixed effects estimation, I thought, provided that RE assumptions are warranted. However, I do not find this to be the case in this simulated example. 
> 
> For the sake of the example, assume you measure subjects' happiness before exposing them to a happy or sad movie, and then you measure their happiness again after watching the movie. Here, "id" marks the subject, "obs" marks the pre- and post-treatment observations of a subject, "d" is the treatment indicator (whether the subject watched the happy or sad movie), "base.happy" is the ~N(0,1)-distributed individual effect a(i), happy is the measured happiness for each subject pre- and post-treatment, respectively, and the error term u(i,t) is also distributed ~N(0,1). 
> 
> id<-rep(c(1:100),each=2) 
> obs<-rep(c(0:1),100) 
> d<-rep(sample(c(-1,1),100,replace=T),each=2) 
> base.happy<-rep(rnorm(100),each=2) 
> happy<-base.happy+1.5*d*obs+rnorm(200) 
> 
> data<-data.frame(id,obs,d,happy) 
> 
> # Now run the random and fixed effects models 
> 
> require(lme4) 
> require(plm)
> 
> reg.re1<-lmer(happy~factor(obs)*factor(d)+(1|id)) 
> summary(reg.re1)
> 
> reg.re2<-plm(happy~factor(obs)*factor(d),index=c('id','obs'),model="random",data=data) 
> summary(reg.re2) 
> 
> 
> reg.fe1<-lm(happy~factor(id)+factor(obs)*factor(d)) 
> summary(reg.fe1) 
> 
> reg.fe2<-plm(happy~factor(obs)*factor(d),index=c('id','obs'),model="within",data=data) 
> summary(reg.fe2) 
> 
> 
> 
> I am confused why FE and RE models are virtually equally efficient in this case. Can somebody lift my confusion? 
> 
> Thanks much, 
> Daniel 
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models




More information about the R-sig-mixed-models mailing list