[R-sig-ME] Random vs. fixed effects

Fri Apr 23 23:57:21 CEST 2010

Two cents from an humble practitioner :

Le vendredi 23 avril 2010 à 14:11 -0400, Ben Bolker a écrit :
> Here's my question for the group:
> 
>   Given that it is a reasonable *philosophical* position to say 'treat
> philosophically random effects as random no matter what, and leave them
> in the model even if they don't appear to be statistically significant',
> and given that with small numbers of random-effect levels this approach
> is likely to lead to numerical difficulties in most (??) mixed model
> packages (warnings, errors, or low estimates of the variance), what
> should one do?  (Suppose one is in a situation that is too complicated
> to use classical method-of-moments approaches -- crossed designs, highly
> unbalanced data, GLMMs ...)
> 
>  1. philosophy, schmilosophy.  Fit these factors as a fixed effect,
> anything else is too dangerous/misleading/unworkable.
>  2. proceed with the 'standard' mixed model (lme4, nlme, PROC MIXED,
> ...) and hope it doesn't break.  Ignore warnings.
>  3. use Bayesian-computational approaches (MCMCglmm, WinBUGS, AD Model
> Builder with post-hoc MCMC calculation? Data cloning?)?  Possibly with
> half-Cauchy priors on variance as recommended by Gelman [Bayesian
> Analysis (2006) 1, Number 3, pp. 515–533]?

Schmilosophically speaking, option #3 has a set of interesting
features :

- It entails the creation and fitting of a full joint probability model.
Internal consistency is guranteed.

- Bayesian modeling tools (especially BUGS) offer enough flexibility to
*require* explicit *choices* for model assumption. For exmple, the
"standard" choice of a normal distribution for level II effects has to
be *explcitely writtern by the modeler, which gives him/her an
opportunity to consider and/or justify his/her choice. This is also true
of the choice of the modelization of nuisance parameters.

- There is no "formal" distinction between fixed and random effects ;
the latter are given a distribution (to be fitted), whereas the formere
are not.

However : 

- getting such a model to converge can be hard (especilly with BUGS).
Current tools need a real (maybe too much) understanding of the
numerical analysis problems to choose an *efficient* model expression,
and some "tricks" used to get convergence (such as rendering a model
deliberately unidentifible...) are difficult to justify rom a modeling
standpoint. Again, Gelman & Hill comparison of the current Bayesian
tools to early-generation regression software seems quite apt.

- The choices made about shape of distributions, relationships, etc ...
should be submitted to a sensitivity analysis, which raises again the
statistician's workload.

In short, option #3 is the *expensive* way of (maybe) getting the most
precise honest answer to your problem... if such thing exists and is
reachble with current numerical tools.

Murphy's law ensures that option #2 will "break" (and warnngs will be
ignored) especilly when warnings mean something important.

But option #1 is *also* an acceptable one : using it entails modeling
*conditionally on your experimental setup*. You won't be able to state
(directly) an estimation of the possible "population" effects of the
formerly-random effects, and the validity of your inferences about the
really-fixed effects will be formerly limited to the particular
population defined by these formerly-random effects. But that might be
okay if you have other information allowing you (and your reader) to
independently assess the possible value of an extrapolation from this
particulr subpopulation to an hypothetical population.

In short, option #1 is a cheap wy of getting a ossibly ccetable
approximate solution to your problem, whose value has to be assessed by
other means.

Of course, you won't benefit from artial pooling, and serious inter-unit
interactions will mess your inferences ...

HTH,

					Emmanuel Charpentier

> Gabor Grothendieck wrote:
> > Here is a simulation of 10k cases using 4 and 50 level factors for the
> > random effect.  With 4 levels there are numerical problems and the
> > accuracy of the random effect is terrible.  With 50 levels there are
> > no numerical problems and the accuracy is much better.
> > 
> >> library(lme4)
> >> set.seed(1)
> >> n <- 10000
> >> k <- 4
> >> f <- function(n, k) {
> > + set.seed(1)
> > + x <- 1:n
> > + fac <- gl(k, 1, n)
> > + fac.eff <- rnorm(k, 0, 4)[fac]
> > + e <- rnorm(n)
> > + y <- 1 + 2 * x + fac.eff + e
> > + lmer(y ~ x + (1|fac))
> > + }
> > 
> >> # simulation with 4 level random effect
> >> f(n, 4)
> > Linear mixed model fit by REML
> > Formula: y ~ x + (1 | fac)
> >    AIC   BIC logLik deviance REMLdev
> >  28733 28762 -14363    28702   28725
> > Random effects:
> >  Groups   Name        Variance Std.Dev.
> >  fac      (Intercept) 1.1162   1.0565
> >  Residual             1.0298   1.0148
> > Number of obs: 10000, groups: fac, 4
> > 
> > Fixed effects:
> >              Estimate Std. Error t value
> > (Intercept) 1.313e+00  5.286e-01       2
> > x           2.000e+00  3.515e-06  568923
> > 
> > Correlation of Fixed Effects:
> >   (Intr)
> > x -0.033
> > Warning message:
> > In mer_finalize(ans) : false convergence (8)
> > 
> >> # simulation with 50 level random effect
> >> f(n, 50)
> > Linear mixed model fit by REML
> > Formula: y ~ x + (1 | fac)
> >    AIC   BIC logLik deviance REMLdev
> >  29040 29069 -14516    29009   29032
> > Random effects:
> >  Groups   Name        Variance Std.Dev.
> >  fac      (Intercept) 11.2016  3.3469
> >  Residual              1.0251  1.0125
> > Number of obs: 10000, groups: fac, 50
> > 
> > Fixed effects:
> >              Estimate Std. Error t value
> > (Intercept) 1.396e+00  4.738e-01       3
> > x           2.000e+00  3.507e-06  570242
> > 
> > Correlation of Fixed Effects:
> >   (Intr)
> > x -0.037
> > 
> > 
> > 
> > 
> > On Fri, Apr 23, 2010 at 9:38 AM, Schultz, Mark R. <Mark.Schultz2 at va.gov> wrote:
> >> I just read a post by Andrew Dolman suggesting that a factor with only 3
> >> levels should be treated as a fixed effect. This seems to be a perennial
> >> question with mixed models. I'd really like to hear opinions from
> >> several experts as to whether there is a consensus on the topic. It
> >> really makes me uncomfortable that such an important modeling decision
> >> is made with an "ad hoc" heuristic.
> >>
> >>
> >>
> >> Thanks,
> >>
> >> Mark Schultz, Ph.D.
> >>
> >> Bedford VA Hospital
> >>
> >> Bedford, Ma.
> >>
> >>
> >>        [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> R-sig-mixed-models at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>
> > 
> > _______________________________________________
> > R-sig-mixed-models at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> 
> -- 
> Ben Bolker
> Associate professor, Biology Dep't, Univ. of Florida
> bolker at ufl.edu / people.biology.ufl.edu/bolker
> GPG key: people.biology.ufl.edu/bolker/benbolker-publickey.asc
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models