[R-sig-ME] Random versus fixed effects

Sat Jul 5 01:51:20 CEST 2008

I have made this sort of comment before, but I think it
important enough to have another go, in a bit more detail.

The extent of generalization
~~~~~~~~~~~~~~~~~~~~~~
Surely the key issue is: "To what population do you wish to
generalize?"  If one wants to generalize to other schools,
then (as in the science data set in DAAG) one must have
data that can be treated as a random sample of schools.

For the science data, it turns out that the schools component
of variance is so small that it can be treated as zero --
differences between classes seems, apart from individual
variation, the only random effect needed.  Moreover degrees
of freedom = 39 for the schools component of variance is
large enough that omission of this ~0 component makes little
difference to the inference.  Thus it may reasonably be omitted,
simplifying the analysis.
[Those who want to avoid talk of degrees of freedom might
go directly to comparison of the two inferences, one with the
schools component of variance, and the other without.
Degrees of freedom are a rough, but often useful, information
measure.]

What if degrees of freedom for the schools component had
been small, and omission of this component did affect the
inference?  Subject area knowledge and experience must
then come into play - is a schools component of variance
likely?, do other studies show evidence of it?, if so what
magnitude?, and so on.

Normality
~~~~~~~
This, while sometimes important, is a second order issue.
The Central Limit Theorem comes to our aid if there is
some modest number of degrees of freedom at the relevant
level.  One can always try transforming the data if it seems
grossly non-normal, at the relevant level of variation.
(Checking this is however non-trivial;  plots of residuals
typically mix in other not-all-that-relevant levels of variation.)

vs fixed effect
~~~~~~~~~~
If the intention is to make statements only about the specific
schools included in the study, then schools may be treated
as fixed effects.  In this case, for the science data set, there
is no detectable difference between schools, and such a
fixed effect can be omitted.

Other reasons for use of fixed effects
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As has been mentioned, there may be other reasons, other
than the wish to generalize appropriately, for modeling an
effect as random.  If one is comparing 50 varieties of wheat,
the estimates that are at the extremes will likely over-estimate
the relevant effects.  The BLUPs that are calculated from an
analysis that treats the variety effects as random pull the
estimates in towards the mean by amounts that, under often
plausible model assumptions, are appropriate.]

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

On 5 Jul 2008, at 12:16 AM, Rune Haubo wrote:

> Hi Luis
>
> I largely agree with Mike's answer and have the following additional
> comments: The decision of whether a variable is taken as fixed or
> random often rests on subject specific matter. An important question
> is: Can the levels of the variable be considered as coming from a
> normal distribution? But other aspects also play a role, such as the
> number of realized levels of the variable (with only few levels, it
> will often be appropriate to treat the variable as fixed anyhow). The
> models rests on different distributional assumptions, so the decision
> is often based on weighing the appropriateness of these assumptions.
>
> To give more specific advise on the actual model comparison (ignoring
> the question of the appropriateness of the comparison), it matters
> whether you are thinking in terms of linear mixed models or
> generalized linear mixed models. In the former case assuming you have
> only one random effect and assuming lme is sufficient, you can do
>
> fm.lme <- lme(....)
> fm.lm <- lm(...)
> anova(fm.lme, fm.lm)
>
> If you are thinking in terms of generalized linear mixed models, and
> you are using lmer, then maybe you can use something like
>
> deviance(fm.lmer <- lmer(...))
> deviance(fm.glm <- glm(...))
>
> however, the reference distribution for the difference in deviance
> depends on the actual body of the function calls.
>
> Regards
> Rune
>
> 2008/7/4 Luis Orlindo Tedeschi <luis.tedeschi at gmail.com>:
>> Thanks Mike... and I thought it would have a single answer... I  
>> glanced
>> over the link you provided; it will take me some time to digest it.  
>> My
>> current problem is comparing a model with variable A as random  
>> effect vs
>> a model with variable A as fixed effect. It gets vary confusing.  
>> Thanks
>> again. Luis
>>
>> On Fri, 2008-07-04 at 09:28 +0100, Mike Dunbar wrote:
>>> Dear Luis
>>>
>>> It is not necessarily straightforward but there is alot of  
>>> information out there that can help you. Take a look at http://wiki.r-project.org/rwiki/doku.php?id=guides:lmer-tests 
>>>  and also look through the archives of this list, e.g. the thread  
>>> entitled "[R-sig-ME] interpreting significance from lmer results  
>>> for dummies (like me)"
>>>
>>> regards
>>>
>>> Mike
>>>
>>>
>>>>>> Luis Orlindo Tedeschi <luis.tedeschi at gmail.com> 03/07/2008  
>>>>>> 22:23 >>>
>>> Folks; I have a quick question about model comparison. Is it ok to  
>>> use
>>> BIC/AIC/-2log to compare models with different fixed and random  
>>> effects
>>> and even different var-(co)var structure? How can I accomplish this
>>> using R? Will Anova do the correct comparison of different models?
>>> Thanks in advance. Luis
>>>
>>> --
>>> Luis Orlindo Tedeschi <luis.tedeschi at gmail.com>
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>>
>> --
>>
>> +----------------------------------------------------+
>>            Luis O. Tedeschi, PhD, PAS
>>               Assistant Professor
>>               Texas A&M University
>>
>> 230 Kleberg Center               p. (+1) 979-845-5065
>> 2471 TAMU                        f. (+1) 979-845-5292
>> College Station, TX 77843-2471
>>
>> http://nutritionmodels.tamu.edu
>> http://nutr.tamu.edu
>> http://people.tamu.edu/~luis.tedeschi
>> +----------------------------------------------------+
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
>
>
> -- 
> Rune Haubo Bojesen Christensen
>
> Master Student, M.Sc. Eng.
> Phone: (+45) 30 26 45 54
> Mail: rhbc at imm.dtu.dk, rune.haubo at gmail.com
>
> DTU Informatics, Section for Statistics
> Technical University of Denmark, Build.321, DK-2800 Kgs. Lyngby,  
> Denmark
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models