[R-sig-ME] is multicollinearity of fixed effects resolved by random effects

Sun May 18 17:45:46 CEST 2008

On Sun, May 18, 2008 at 5:35 AM, Reinhold Kliegl
<reinhold.kliegl at gmail.com> wrote:
> I am not sure about how 800 SPECIES and 24 SITEs relate to each other
> in this email. The following proposal assumes thtat a given SPECIES is
> observed at each or at least quite a few of the 24 sites (i.e., I
> assume you have several, ideally up to 24 measures for each species).
> If each species occurs at precisely one site and if you have only one
> measure for each species at this site (i.e., if you have excatly 800
> observations), then SPECIES is not be part of the model. So let's
> assume you have several measures for each species.
>
> First, treat site and species as (possibly partially) crossed random
> effects.  Assuming that the species in site 1 and site 2 are pretty
> much the same. The nesting would be required if species 1 is a
> different beast in site 1 and in site 2. Think of a student 1 in class
> 1 and student 1 in class2; they are likely different persons. However,
> if you observe student 1 in a music class and the same student 1 in a
> math class; they are obviously the same person. In lmer, if you give
> units the same identification it will assume they are the same unit.
>
> Second, definitely center your predictors prior to any analyses. Here
> you need to think about the level at which you want to center and you
> will need to read up on this. For starters, centering them on their
> grand mean should not be wrong, but it may not be optimal. (This
> becomes an issue especially if you want include covariates that
> describe the sites, i.e., that are identical for all SPECIES observed
> at a given SITE; actually I suspect that your three predictors are
> such variables.)
>
> Third, If collinearity is very strong, you may want to think of
> combining your predictors into a single one or collaps two of them;
> after all they seem to be getting at the same thing. See also John
> Maindonald's suggestions on this topic. Definitely get a good idea
> about how you predictors relate to the dependent variable. You may
> also want to check whether high-order polynomials are likely required
> to be in the fixed-effects part (e.g., quadratic trend for MAT, etc.).
>
> mix.model1 <- lmer(X13C~ MAT+MAP+LAT + (1|SPECIES) + (1|SITE), method
> ="ML", data=ds)
> mix.model2 <- lmer(X13C~ (MAT+MAP+LAT)^2 + (1|SPECIES) + (1|SITE),
> method ="ML", data=ds)
> and, possibly,
> mix.model3 <- lmer(X13C~ MAT*MAP*LAT + (1|SPECIES) + (1|SITE), method
> ="ML", data=ds)
>
> Further things:
> After you feel happy with your fixed-effects part (or better: once you
> developed a substantively and statistically defensible
> representation), consider including (some of) them as varying slopes
> into the random part of the model. I would start with those that have
> the largest fixed effects;  e..g.
>
> mix.model4 <- lmer(X13C~ MAT+MAP+LAT + (MAT|SPECIES) + (MAT|SITE),
> method ="ML", data=ds)
> or
> mix.model5 <- lmer(X13C~ MAT+MAP+LAT + (MAT|SPECIES) + (1|SITE),
> method ="ML", data=ds)
> or
> mix.model6 <- lmer(X13C~ MAT+MAP+LAT + (1|SPECIES) + (MAT|SITE),
> method ="ML", data=ds)
>
> etc.
>
> Definitely check the distribution of the residuals of your final
> model(s). You may need to think about a transformation of your
> dependent variable.
>
> One question for John Maindonald: Why would you include SPECIES as a
> fixed effect? It leads to 799 parameters being estimated. Or am I
> missing something here.

I was thinking the same thing.  The total number of observations
wasn't stated in the original message and I think that would be
important in deciding exactly how to go about modeling the data.   If
Jordan could make the data available on a web site as a text file or a
saved R data object I think it would help focus the discussion.  If
that is not possible I would like to see the output of

table(table(ds$SPECIES))  # frequency table of number of observations
per species
xtabs(~ SITE, ds)             # table of number of observations per site

Also, I assume that LAT, MAT and MAP do not change within SITE so
there be at most 24 unique values for those covariates.  It might be
interesting to look at scatterplots, perhaps with just smoother lines
and not the points, if there would be a considerable amount of
overplotting.  That is

library(lattice)
xyplot(X13C ~ MAP, ds, type = c("g", "smooth"))  # repeat for MAT and LAT

Speculating in advance of seeing the data, which Sherlock Holmes
characterized as a "capital mistake", I would be inclined to start
with random effects of the form
   (1|SPECIES) + (1|SITE)
or
   (1|SPECIES/SITE)
which expands to
   (1|SPECIES) + (1|SITE:SPECIES)

It seems more likely to me that I would want random effects for the
800 species than for the 24 sites but I am committing the "capital
mistake" and will not speculate further.

> On Sun, May 18, 2008 at 3:07 AM, John Maindonald
> <john.maindonald at anu.edu.au> wrote:
>> Don't you want MAT+MAP+LAT, possibly with first order
>> interactions in also?  If you still find multicollinearity, look
>> at the relationship between MAT, MAP and LAT, probably
>> using regression.  If you still find multi-collinearity, you
>> need to work out which variables or constructed variables
>> are most meaningful to retain.
>>
>> Surely you should allow for a fixed effect of species.  So my
>> guess would be:
>>
>> mix.model1 <- lmer(X13C~ MAT+MAP+LAT + SPECIES + (1|SITE/SPECIES), method
>> ="ML", data=ds)
>> or maybe
>> mix.model1 <- lmer(X13C~ (MAT+MAP+LAT + SPECIES)^2 + (1|SITE/SPECIES),
>> method ="ML", data=ds)
>>
>> But before you go too far, consider whether some species may
>> show very consistent variation across sites, whereas others may
>> be quite incognisant of sites; they're the tourists who live in
>> expensive luxury hotels that are the same wherever they go!
>> Fit fixed effect models for each species, then xyplot() to look at
>> the pattern of change of parameter estimates across species.
>>
>> John Maindonald             email: john.maindonald at anu.edu.au
>> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
>> Centre for Mathematics & Its Applications, Room 1194,
>> John Dedman Mathematical Sciences Building (Building 27)
>> Australian National University, Canberra ACT 0200.
>>
>>
>> On 18 May 2008, at 5:21 AM, Jordan Mayor wrote:
>>
>>> Hello, I am currently using mixed models (lme4) to explain variability in
>>> fungal isotope patterns in 800 fungal SPECIES across 24 SITES in the
>>> world.
>>> I have thus decided to nest the species within site as hierarchical
>>> *random
>>> effects*.  I am just trying to explain the variability in fungal isotopes
>>> values not necessarily predict values of fungi from other areas of the
>>> world.
>>>
>>> My *fixed effects* include mean annual temperature, mean annual
>>> precipitation, and latitude.  Perhaps obviously, all three suffer from
>>> strong multicollinearity problems.
>>>
>>> *My question is:*  Should I simply omit using fixed effects additively?
>>>  In
>>> other words, is it valid to simply use the fixed effects singly or as
>>> interactions (see code below) or do the random effects take care of the
>>> collinearity issue during adjustment of model error variance allowing me
>>> to
>>> compare full-factorial models instead?
>>>
>>> Thanks in advance to all those with advice or references.
>>>
>>> --
>>> Jordan Mayor
>>>
>>> My code:
>>>
>>>> names(ds)
>>>
>>> [1] "STUDY"    "SPECIES"  "SITE"     "MAT"      "MAP"        "X13C"
>>> "X15N"    "LAT"
>>>
>>>> mix.model1=lmer(X13C~ MAT:MAP:LAT + (1|SITE/SPECIES), method ="ML",
>>>
>>> data=ds)
>>>>
>>>> mix.model2=lmer(X13C ~ MAT:MAP + (1|SITE/SPECIES), method ="ML", data=ds)
>>>> mix.model3=lmer(X13C ~ MAP:LAT + (1|SITE/SPECIES), method ="ML", data=ds)
>>>> mix.model4=lmer(X13C ~ MAT:LAT + (1|SITE/SPECIES), method ="ML", data=ds)
>>>> mix.model5=lmer(X13C ~ MAT + (1|SITE/SPECIES), method ="ML", data=ds)
>>>> mix.model6=lmer(X13C ~ MAP + (1|SITE/SPECIES), method ="ML", data=ds)
>>>> mix.model7=lmer(X13C ~ LAT + (1|SITE/SPECIES), method ="ML", data=ds)
>>>> mix.model8=lmer(X13C ~ (1|SITE/SPECIES), data=ds)
>>>>
>>>
>>> anova(mix.model1,mix.model2,mix.model3,mix.model4,mix.model5,mix.model6,mix.model7,mix.model8,data=ds)
>>>
>>>                      Df     AIC        BIC         logLik    Chisq Chi
>>> Df   Pr(>Chisq)
>>> mix.model8.p  4  1024.53  1039.11  -508.26
>>> mix.model1.p  5  1025.65  1043.87  -507.82  0.8800      1     0.3482
>>> mix.model2.p  5  *1016.43*  1034.66  -503.22  9.2145      0     <2e-16 ***
>>> mix.model3.p  5  1022.63  1040.85  -506.31  0.0000      0     <2e-16 ***
>>> mix.model4.p  5  1023.09  1041.31  -506.54  0.0000      0     <2e-16 ***
>>> mix.model5.p  5  1023.25  1041.48  -506.62  0.0000      0     <2e-16 ***
>>> mix.model6.p  5  1025.80  1044.02  -507.90  0.0000      0     <2e-16 ***
>>> mix.model7.p  5  1025.32  1043.55  -507.66  0.4793      0     <2e-16 ***
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>