[R-sig-ME] is multicollinearity of fixed effects resolved by random effects

Sun May 18 12:35:41 CEST 2008

I am not sure about how 800 SPECIES and 24 SITEs relate to each other
in this email. The following proposal assumes thtat a given SPECIES is
observed at each or at least quite a few of the 24 sites (i.e., I
assume you have several, ideally up to 24 measures for each species).
If each species occurs at precisely one site and if you have only one
measure for each species at this site (i.e., if you have excatly 800
observations), then SPECIES is not be part of the model. So let's
assume you have several measures for each species.

First, treat site and species as (possibly partially) crossed random
effects.  Assuming that the species in site 1 and site 2 are pretty
much the same. The nesting would be required if species 1 is a
different beast in site 1 and in site 2. Think of a student 1 in class
1 and student 1 in class2; they are likely different persons. However,
if you observe student 1 in a music class and the same student 1 in a
math class; they are obviously the same person. In lmer, if you give
units the same identification it will assume they are the same unit.

Second, definitely center your predictors prior to any analyses. Here
you need to think about the level at which you want to center and you
will need to read up on this. For starters, centering them on their
grand mean should not be wrong, but it may not be optimal. (This
becomes an issue especially if you want include covariates that
describe the sites, i.e., that are identical for all SPECIES observed
at a given SITE; actually I suspect that your three predictors are
such variables.)

Third, If collinearity is very strong, you may want to think of
combining your predictors into a single one or collaps two of them;
after all they seem to be getting at the same thing. See also John
Maindonald's suggestions on this topic. Definitely get a good idea
about how you predictors relate to the dependent variable. You may
also want to check whether high-order polynomials are likely required
to be in the fixed-effects part (e.g., quadratic trend for MAT, etc.).

mix.model1 <- lmer(X13C~ MAT+MAP+LAT + (1|SPECIES) + (1|SITE), method
="ML", data=ds)
mix.model2 <- lmer(X13C~ (MAT+MAP+LAT)^2 + (1|SPECIES) + (1|SITE),
method ="ML", data=ds)
and, possibly,
mix.model3 <- lmer(X13C~ MAT*MAP*LAT + (1|SPECIES) + (1|SITE), method
="ML", data=ds)

Further things:
After you feel happy with your fixed-effects part (or better: once you
developed a substantively and statistically defensible
representation), consider including (some of) them as varying slopes
into the random part of the model. I would start with those that have
the largest fixed effects;  e..g.

mix.model4 <- lmer(X13C~ MAT+MAP+LAT + (MAT|SPECIES) + (MAT|SITE),
method ="ML", data=ds)
or
mix.model5 <- lmer(X13C~ MAT+MAP+LAT + (MAT|SPECIES) + (1|SITE),
method ="ML", data=ds)
or
mix.model6 <- lmer(X13C~ MAT+MAP+LAT + (1|SPECIES) + (MAT|SITE),
method ="ML", data=ds)

etc.

Definitely check the distribution of the residuals of your final
model(s). You may need to think about a transformation of your
dependent variable.

One question for John Maindonald: Why would you include SPECIES as a
fixed effect? It leads to 799 parameters being estimated. Or am I
missing something here.

Reinhold Kliegl

On Sun, May 18, 2008 at 3:07 AM, John Maindonald
<john.maindonald at anu.edu.au> wrote:
> Don't you want MAT+MAP+LAT, possibly with first order
> interactions in also?  If you still find multicollinearity, look
> at the relationship between MAT, MAP and LAT, probably
> using regression.  If you still find multi-collinearity, you
> need to work out which variables or constructed variables
> are most meaningful to retain.
>
> Surely you should allow for a fixed effect of species.  So my
> guess would be:
>
> mix.model1 <- lmer(X13C~ MAT+MAP+LAT + SPECIES + (1|SITE/SPECIES), method
> ="ML", data=ds)
> or maybe
> mix.model1 <- lmer(X13C~ (MAT+MAP+LAT + SPECIES)^2 + (1|SITE/SPECIES),
> method ="ML", data=ds)
>
> But before you go too far, consider whether some species may
> show very consistent variation across sites, whereas others may
> be quite incognisant of sites; they're the tourists who live in
> expensive luxury hotels that are the same wherever they go!
> Fit fixed effect models for each species, then xyplot() to look at
> the pattern of change of parameter estimates across species.
>
> John Maindonald             email: john.maindonald at anu.edu.au
> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
> Centre for Mathematics & Its Applications, Room 1194,
> John Dedman Mathematical Sciences Building (Building 27)
> Australian National University, Canberra ACT 0200.
>
>
> On 18 May 2008, at 5:21 AM, Jordan Mayor wrote:
>
>> Hello, I am currently using mixed models (lme4) to explain variability in
>> fungal isotope patterns in 800 fungal SPECIES across 24 SITES in the
>> world.
>> I have thus decided to nest the species within site as hierarchical
>> *random
>> effects*.  I am just trying to explain the variability in fungal isotopes
>> values not necessarily predict values of fungi from other areas of the
>> world.
>>
>> My *fixed effects* include mean annual temperature, mean annual
>> precipitation, and latitude.  Perhaps obviously, all three suffer from
>> strong multicollinearity problems.
>>
>> *My question is:*  Should I simply omit using fixed effects additively?
>>  In
>> other words, is it valid to simply use the fixed effects singly or as
>> interactions (see code below) or do the random effects take care of the
>> collinearity issue during adjustment of model error variance allowing me
>> to
>> compare full-factorial models instead?
>>
>> Thanks in advance to all those with advice or references.
>>
>> --
>> Jordan Mayor
>>
>> My code:
>>
>>> names(ds)
>>
>> [1] "STUDY"    "SPECIES"  "SITE"     "MAT"      "MAP"        "X13C"
>> "X15N"    "LAT"
>>
>>> mix.model1=lmer(X13C~ MAT:MAP:LAT + (1|SITE/SPECIES), method ="ML",
>>
>> data=ds)
>>>
>>> mix.model2=lmer(X13C ~ MAT:MAP + (1|SITE/SPECIES), method ="ML", data=ds)
>>> mix.model3=lmer(X13C ~ MAP:LAT + (1|SITE/SPECIES), method ="ML", data=ds)
>>> mix.model4=lmer(X13C ~ MAT:LAT + (1|SITE/SPECIES), method ="ML", data=ds)
>>> mix.model5=lmer(X13C ~ MAT + (1|SITE/SPECIES), method ="ML", data=ds)
>>> mix.model6=lmer(X13C ~ MAP + (1|SITE/SPECIES), method ="ML", data=ds)
>>> mix.model7=lmer(X13C ~ LAT + (1|SITE/SPECIES), method ="ML", data=ds)
>>> mix.model8=lmer(X13C ~ (1|SITE/SPECIES), data=ds)
>>>
>>
>> anova(mix.model1,mix.model2,mix.model3,mix.model4,mix.model5,mix.model6,mix.model7,mix.model8,data=ds)
>>
>>                      Df     AIC        BIC         logLik    Chisq Chi
>> Df   Pr(>Chisq)
>> mix.model8.p  4  1024.53  1039.11  -508.26
>> mix.model1.p  5  1025.65  1043.87  -507.82  0.8800      1     0.3482
>> mix.model2.p  5  *1016.43*  1034.66  -503.22  9.2145      0     <2e-16 ***
>> mix.model3.p  5  1022.63  1040.85  -506.31  0.0000      0     <2e-16 ***
>> mix.model4.p  5  1023.09  1041.31  -506.54  0.0000      0     <2e-16 ***
>> mix.model5.p  5  1023.25  1041.48  -506.62  0.0000      0     <2e-16 ***
>> mix.model6.p  5  1025.80  1044.02  -507.90  0.0000      0     <2e-16 ***
>> mix.model7.p  5  1025.32  1043.55  -507.66  0.4793      0     <2e-16 ***
>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>