[R-sig-ME] is multicollinearity of fixed effects resolved by random effects

Mon May 19 02:02:09 CEST 2008

I think grouping them into families is a very good idea.
This makes use of prior insight, reduces the number of
parameters to an extent that it becomes more reasonable
to think about fixed effects, and you can look for individual
species that stray from the path laid out for them by their
families.  For the fixed effects analyses that I suggested,
you might do these by families.

Conceptually, there are taxonomic factors that surely
function, no doubt in interaction with location variables,
as fixed effects.  I consider that one ought to start by thinking
of them as fixed effects, unless it can be demonstrated that
data are indistinguishable from random variation.

Maybe however those effects operate more as the level of
genera or families than at the level of species.  Responding
to this point may be a useful aim for the study.

John.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

On 19 May 2008, at 4:04 AM, Jordan Mayor wrote:

> Thank you JM, RK, and DB for your thoughtful responses to my  
> questions.  I am working on posting the file on the web and will  
> respond as soon as it is publicly available - (I could email it  
> direct to those concerned as well).  Until then here is the output  
> requested by DB and RK as well as more information regarding sites,  
> species, and the data:
>
> >ds=d[d$TYPE=="s",] #separates into just sap fungi as used in the  
> example below
> >dm=d[d$TYPE=="m",] #separates into just myc fungi (treated as  
> distinct populations in my analyses)
>
> > table(table(ds$SPECIES))
>
>     0      1    2   3   5   6
> 393 161  45   7   1   1
> # As seen all species are not present at all the sites - this is  
> partly why I thought to nest SPECIES within SITE.  The sites range  
> widely from boreal tundra, to temperate California, to tropical  
> forests of Borneo and Guyana.  Most fungi are not so cosmopolitan as  
> to even potentially be present everywhere - let alone be collected  
> during brief collecting forays.  One site (pine forest in CA) didn't  
> even list  fungal names.  All fungi were listed as mycorrhizal (m),  
> saprotrophic (s), or of unknown ecological roles (unk).  Most of the  
> genera (or families) are present at all sites but that biologically  
> coarseness concerns me.  If, as pointed out by RK the ideal  
> situation would be every species present at every site then perhaps  
> creating a new column containing families or genera will move the  
> dataset toward that direction.
>
> > xtabs(~SITE,ds)
> SITE
>                              Aheden                             
> Aheden 2
>                                    
> 4                                   0
>    Ashiu (temp deciduous broadleaf)        Betsele
>                                   
> 21                                   0
>                      Breuil, France                       Chiba
>                                   
> 14                                   6
>                                   d                              
> Flakaliden
>                                   
> 23                                   0
>                 Glacier Bay, Alaska                Guyana
>                                   4                                   
> 20
>                                   h                     heath  
> tundra, subarctic Sweden
>                                   
> 38                                   4
>                           Kagoshima                        
> Kulbacksliden
>                                    
> 1                                   0
>                               Kyoto                        Lamar  
> Haines
>                                  26                                   
> 13
>           Lambir (lowland tropical)               Miyajima
>                                   
> 14                                   1
>                            Norikura                        Norrliden
>                                    
> 3                                   0
>                             Okinawa       Ontake (subalpine  
> coniferous)
>                                    
> 0                                   8
>                               Oodai                  pine forests in  
> CA
>                                   2                                   
> 25
>                           Shirahama                     nowbowl
>                                   2                                   
> 13
>                   Spruce plantation                   Stadsskogen
>                                  17                                   
> 13
>                         Svartberget                     Tanigawa
>                                    
> 0                                   6
> tussock tundra near Toolik Lake, AK      Vilan
>                                    
> 5                                   0
>                         Woods Creek
>                                  25                     # the zeros  
> here are because some sites have no X13C data - only X15N
>
> I did try running models with centered predictors, as suggested by  
> RK - see below.  I found that the AIC scores were much larger  
> however when the predictors were centered using my method.  I  
> centered by taking the mean of mycorrhizal and saprotrophic fungal  
> groups at each site - then I subtracted each (m) or (s) fungus from  
> those same group means within each site.  Because the (m) and (s)  
> fungi have unique sources of carbon, and nitrogen and these are  
> reflected in their isotope values (X13C, X15N), I deemed this  
> centering level to be appropriatein order to preserve the magnitude  
> of difference between the groups.
>
> > mix.model1  # Raw isotope values used
> Linear mixed model fit by maximum likelihood
> Formula: X13C ~ MAT + MAP + LAT + (1 | SPECIES) + (1 | SITE)
>    Data: ds
>   AIC   BIC    logLik     deviance  REMLdev
>  1016 1041 -500.9     1002         1031
> Random effects:
>  Groups   Name        Variance Std.Dev.
>  SPECIES  (Intercept) 0.62149  0.78835
>  SITE     (Intercept) 0.34885  0.59063
>  Residual             1.28911  1.13539
> Number of obs: 283, groups: SPECIES, 215; SITE, 24
>
> Fixed effects:
>               Estimate Std. Error t value
> (Intercept) -2.207e+01  1.121e+00 -19.695
> MAT         -6.418e-02  3.833e-02  -1.675
> MAP          3.137e-05  2.100e-04   0.149
> LAT         -8.690e-03  1.778e-02  -0.489
>
> Correlation of Fixed Effects:
>           (Intr)       MAT     MAP
> MAT  -0.707
> MAP  -0.401  -0.247
> LAT   -0.959  0.692   0.272
>
> > mix.model1a # Group centered within each site
> Linear mixed model fit by maximum likelihood
> Formula: STND_13c ~ MAT + MAP + LAT + (1 | SPECIES) + (1 | SITE)
>    Data: ds
>   AIC   BIC    logLik     deviance  REMLdev
>  2623 2649  -1305     2609         2620
> Random effects:
>  Groups   Name        Variance Std.Dev.
>  SPECIES  (Intercept) 301.9720 17.3773
>  SITE     (Intercept)   5.1294  2.2648
>  Residual             327.3857 18.0938
> Number of obs: 283, groups: SPECIES, 215; SITE, 24
>
> Fixed effects:
>              Estimate Std. Error t value
> (Intercept) 3.463e+01  1.166e+01  2.9713
> MAT         3.899e-01  4.330e-01  0.9006
> MAP         2.543e-05  1.811e-03  0.0140
> LAT         3.235e-01  1.867e-01  1.7324
>
> Correlation of Fixed Effects:
>          (Intr)       MAT     MAP
> MAT -0.779
> MAP -0.230  -0.300
> LAT  -0.955   0.732   0.116
>
> > anova(mix.model1,mix.model1a)
> Data: ds
> Models:
> mix.model1: X13C ~ MAT + MAP + LAT + (1 | SPECIES) + (1 | SITE)
> mix.model1a: STND_13c ~ MAT + MAP + LAT + (1 | SPECIES) + (1 | SITE)
>                             Df      AIC         BIC          logLik   
> Chisq Chi Df  Pr(>Chisq)
> mix.model1.p    7      1015.85  1041.37  -500.93
> mix.model1a.p  7      2623.49  2649.00 -1304.74     0      0       <  
> 2.2e-16 ***
>
> # As seen above, the model using my centered values performed  
> poorly.  My interpretation is that the centering removed the very  
> variability associated with climate I am trying to predict!
>
> #In addition, I compared the other models mentioned by RK using the  
> raw isotope values:
>
> > anova(mix.model1,mix.model2,mix.model3)
> Data: ds
> Models:
> mix.model1: X13C ~ MAT + MAP + LAT + (1 | SPECIES) + (1 | SITE)
> mix.model2: X13C ~ (MAT + MAP + LAT)^2 + (1 | SPECIES) + (1 | SITE)
> mix.model3: X13C ~ MAT * MAP * LAT + (1 | SPECIES) + (1 | SITE)
>                          Df     AIC         BIC            
> logLik      Chisq    Chi Df   Pr(>Chisq)
> mix.model1.p  7     1015.85  1041.37    -500.93
> mix.model2.p 10    978.21    1014.66    -479.10   43.645       
> 3       1.796e-09 ***
> mix.model3.p 11    980.18    1020.28    -479.09   0.023         
> 1        0.8794
>
> # I have refrained from trying the models mentioned by DB in order  
> to protect his rights to not commit the "capital mistake" ;)  
> however, the three predictors (MAT, MAP, LAT) do indeed seem to  
> overplot - I will try to select only one perhaps in my final models.
>
> # Again - thank you all for your help on this.
>
> -- 
> Jordan Mayor, Ph.D. Candidate
> Ecosystem Dynamics Research Lab
> Department of Botany, University of Florida
> Gainesville, FL 32611