[R-sig-ME] understanding log-likelihood/model fit

Tue Aug 19 22:47:15 CEST 2008

Dear All,

I'm sure this is a simple question, but I haven't been able to find an
answer to it that I can understand. I'd like an answer that is pitched
somewhere between the full mathematics on the one hand, and an
oversimplified overview on the other.

One way of putting the question is, what is the difference, really,
between a fixed and a random effect (as they are fit by lmer)?
Another way of putting it involves the following example.

Suppose we have observations of a response from many subjects.
The overall average response is 500.
The subjects fall into two groups.
Half have an effect of +100 and half an effect of -100.

test1 <- data.frame(subject=rep(1:200,each=500),
	response=500+c(rep(-100,50000),rep(100,50000))+rnorm(100000,0,10),
	fixed=(rep(c("A","B"),each=50000)))

The following model treats subject as a random effect:
> null <- lmer(response~(1|subject),test1)

The following model keeps the subject effect and adds the fixed effect.
> fixed <- lmer(response~fixed+(1|subject),test1)

> null
Linear mixed model fit by REML
Formula: response ~ (1 | subject)
   Data: test1
    AIC    BIC  logLik deviance REMLdev
 746923 746951 -373458   746923  746917
Random effects:
 Groups   Name        Variance Std.Dev.
 subject  (Intercept) 10041.81 100.209
 Residual               100.46  10.023
Number of obs: 100000, groups: subject, 200
Fixed effects:
            Estimate Std. Error t value
(Intercept)  500.000      7.086   70.56

> fixed
Linear mixed model fit by REML
Formula: response ~ fixed + (1 | subject)
   Data: test1
    AIC    BIC  logLik deviance REMLdev
 743977 744015 -371984   743960  743969
Random effects:
 Groups   Name        Variance  Std.Dev.
 subject  (Intercept)  0.016654 0.12905
 Residual             99.642120 9.98209
Number of obs: 100000, groups: subject, 200
Fixed effects:
             Estimate Std. Error t value
(Intercept) 400.11806    0.04647    8610
fixedB      199.87485    0.06572    3041

The result is what one would expect, intuitively.
In the model "null" there is a large subject variance.
In the model "fixed" there is virtually no subject variance.
In both models the residuals are the same.
The logLik of the model with the fixed effect is closer to zero (by about 1500).
Therefore, we say the model with the fixed effect fits better.

This makes sense. Instead of 100 subject effects near +100 and 100
near -100, we have virtually no subject effects and the fixed effect
accounts for all the between-subject variance.

The question: why? Why does the model with the fixed effect fit better?
Why does the smaller (zero) random effect plus the fixed effect
translate into an improvement in log-likelihood?

It's not anything to do with the residuals. The models make the same
predictions:

> fitted(null)[c(1:5,50001:50005)]
 [1] 400.2807 400.2807 400.2807 400.2807 400.2807 600.2013 600.2013 600.2013
 [9] 600.2013 600.2013

> fitted(fixed)[c(1:5,50001:50005)]
 [1] 400.1282 400.1282 400.1282 400.1282 400.1282 599.9839 599.9839 599.9839
 [9] 599.9839 599.9839

And I don't think it has anything to do with the extreme non-normality
of the random effects in "null" as opposed to "fixed".

So what's the difference?

What, in terms of model fitting, makes it preferable to account for
the between-subject variation with a fixed effect (as in "fixed")
rather than with a random effect (as in "null")?

Thanks for your help,
Daniel