[R-sig-phylo] GLM, GLMM, CAIC, mcmcGLMM - AIC model selection for phylogentic data

Chris Mcowen chrismcowen at gmail.com
Thu Jan 13 17:16:13 CET 2011


The point of me doing this was i am new to the area so i wanted to make sure when i have my PhD viva / submit a paper i could defend my method - as you say 
> some might be better than others.

I am currently writing up and am unsure if i should make a chapter / paper out of comparing the results of the different methods or not. 

So, if i interpret what you are saying correctly, the reason that all the methods i tried (phylogenetic and not ) gave the same answer is data specific?  if this is the case, and given all the research in this area i imagine it is, i wonder why  

> You can have a strong phylogenetic
> signal; that still (as you have demonstrated) doesn't mean that it will
> overturn the conclusions based on a non-phylogenetic analysis.

The conclusion i would like to draw from this is why? Is there a pattern to this, could randomizations show why this occurs? Or is it something simple that i am making too complicated? I just don't understand how having a strong phylogenetic signal and using phylogenetic and no phylogenetic methods can give the same answer? And under what circumstances does this not hold true?

Chris






 The fact that *for this analysis* all the methods used give the same
answer doesn't invalidate the general point that the different methods
do different things and that some might be better than others.

 I guess I'm curious what conclusions you're drawing from this exercise:

* "am I doing something wrong, or missing something obvious"?
    - I don't think so.
* "why does everyone make such a fuss about the differences"?
    - Because they sometimes (just not in this case) have a big effect
on the conclusions


On 13 Jan 2011, at 16:04, Ben Bolker wrote:

On 11-01-13 10:13 AM, Chris Mcowen wrote:
> Hi Ben,
> 
> Thanks for the reply:
> 
> The Pagel test showed that there is a strong phylogentic signal in my
> data.
> 
> I don't believe i used
>> reasonably minor differences in the way that you account for
>> correlation
> 
> as i used a method without any account for the correlation  i.e GLM,
> which selected the same model set  and the same "best" model as GLMM
> the other "phylogentic" methods.
> 
> I guess my central point is i spent a while researching methods to
> deal with phylogenetic structure, and there are various schools of
> thought of the best method. Some say using random effects in GLMM
> models is not capable of dealing with phylogentic structure where as
> others suggest the CAIC method is the best as it actually uses the
> tree. So from these points of view there is actually a considerable
> difference in the method used?
> 
> If you look at the model set "selected" based on AIC differences by
> each method, they are the same, therefore is any method really better
> for this type of investigation than another?

OK, "relatively minor" was an overstatement (sorry I overlooked the
fact that you used a GLM as well).  You can have a strong phylogenetic
signal; that still (as you have demonstrated) doesn't mean that it will
overturn the conclusions based on a non-phylogenetic analysis.

 The fact that *for this analysis* all the methods used give the same
answer doesn't invalidate the general point that the different methods
do different things and that some might be better than others.

 I guess I'm curious what conclusions you're drawing from this exercise:

* "am I doing something wrong, or missing something obvious"?
    - I don't think so.
* "why does everyone make such a fuss about the differences"?
    - Because they sometimes (just not in this case) have a big effect
on the conclusions

   Ben Bolker

> 
> Chris
> 
> I think your comparison is really worthwhile, but I don't see why
> this is surprising at all -- perhaps I'm missing something.  Given
> that you have reasonably strong effects of your predictors,
> reasonably minor differences in the way that you account for
> correlation won't change the qualitative conclusion
> 
> On 13 Jan 2011, at 15:00, Ben Bolker wrote:
> 
> On 11-01-13 04:38 AM, Chris Mcowen wrote:
> 
>> Dear list,
> 
>> I am modelling the effect of various life history traits of
>> species against their extinction rating.
> 
>> My data has a phylogentic signal (see table below) so to be 
>> statistically correct i worked with phylogentic independent
>> contrasts
> 
> 
>> From Purvis et al., 2000 "phylogenetic analyses were necessary 
>> because of the pseudoreplication and, hence, elevated type I error 
>> rates that result from treating species as independent points when 
>> relevant variables show a phylogenetic pattern"
> 
>> Variable					» ( Pagel) IUCN extinction risk		0.47 Breeding system 
>> 0.99 Endosperm				0.96 Floral symmetry			0.93 Fruit					0.97
>> Pollen dispersal			0.99 Seasonality				0.52 Storage organ			0.85
>> Woodyness 0.61
> 
>> There are various ways of doing this one method is to use the
>> package CAIC (generated using compara- tive analysis by
>> independent contrasts)  which utilizes the phylogeny to generate
>> the independent contrasts. However as i was using it i found this:
>> from Sodhi et al., 2008
> 
>>> It was necessary to decompose the variance across species by
>>> coding the random-effects error structure of the GLMM as a
>>> hierarchical taxonomic (class/order/ family) effect (Blackburn &
>>> Duncan, 2001). We had insufficient replication within genera to
>>> include the genus in the nested random effect. Our method is more
>>> appropriate than the independent-contrasts approach (Purvis et
>>> al., 2000) in situations where a complete phylogeny of the study
>>> taxon is unavailable, when categorical variables are included in
>>> the analysis, and when model selection, rather than hypothesis
>>> testing, is the statistical paradigm being used.
> 
>> So i gave this a go - using GLMM and setting the order / family as 
>> random effects.
> 
>> I then came across mcmcGLMM which allows the use of a phylogentic 
>> tree to deal with the phylogenetic structure of the data set.
> 
>> So i gave this a go as well!
> 
>> Finally to be consistent i ran the model with no phylogenetic
>> control - using GLM.
> 
>> My criteria for model selection was that of Burnham and Anderson - 
>> using AIC and AICML etc to select the "most likely" set of models.
> 
>> Interestingly i found that using all four methods the same model 
>> (based on AIC difference) was selected as the most likely? 
>> Furthermore, the pattern of AIC differences across the models 
>> reflected each other.
> 
> I think your comparison is really worthwhile, but I don't see why
> this is surprising at all -- perhaps I'm missing something.  Given
> that you have reasonably strong effects of your predictors,
> reasonably minor differences in the way that you account for
> correlation won't change the qualitative conclusions.
> 
> Ben Bolker
> 



More information about the R-sig-phylo mailing list