[R-sig-phylo] GLM, GLMM, CAIC, mcmcGLMM - AIC model selection for phylogentic data

Ben Bolker bbolker at gmail.com
Mon Jan 17 16:08:46 CET 2011


On 11-01-13 11:16 AM, Chris Mcowen wrote:
> The point of me doing this was i am new to the area so i wanted to
> make sure when i have my PhD viva / submit a paper i could defend my
> method - as you say
>> some might be better than others.
> 
> I am currently writing up and am unsure if i should make a chapter /
> paper out of comparing the results of the different methods or not.
> 
> So, if i interpret what you are saying correctly, the reason that all
> the methods i tried (phylogenetic and not ) gave the same answer is
> data specific?  if this is the case, and given all the research in
> this area i imagine it is, i wonder why
> 
>> You can have a strong phylogenetic signal; that still (as you have
>> demonstrated) doesn't mean that it will overturn the conclusions
>> based on a non-phylogenetic analysis.
> 
> The conclusion i would like to draw from this is why? Is there a
> pattern to this, could randomizations show why this occurs? Or is it
> something simple that i am making too complicated? I just don't
> understand how having a strong phylogenetic signal and using
> phylogenetic and no phylogenetic methods can give the same answer?
> And under what circumstances does this not hold true?
> 
> Chris

  Here's a simple example. Suppose that you have species within genera
and genera within species, and that each level we have a polytomy --
that is, the genera all radiated in a single event, and then within each
genus the species radiated in a single event.  At each level (species
within genera and genera within families) there is a strong positive
correlation between trait X and trait Y.  If you plot the data in the
X-Y plane you'll see a series of clusters of points (corresponding to
different genera), but all the clusters will also be aligned along the
same linear trend. Because we have polytomies/stars at each level, the
results of taxonomy-based and phylogeny-based methods will be the same.
 Because the within- and between-genus patterns are the same,
phylogenetic and non-phylogenetic methods will give about the same
answers (although may differ in their effective number of degrees of
freedom/strength of inference).

  If instead the within-genus relationships were opposite (e.g. negative
correlations) then the phylogenetic and non-phylogenetic methods would
give different answers.

  Comparing different statistical approaches on a single data set is a
bit like making statistical inferences from a single data point.  In
order to compare methodologies you have to understand them at a
reasonably deep level.  If possible, use some mathematical analysis to
understand when the results will differ; otherwise, use your
understanding of the differences to give verbal or heuristic arguments
of when they will differ.  Simulations guided by this analysis will then
show the magnitude of the difference across a range of situations.

  cheers
    Ben Bolker

> 
> 
> 
> 
> The fact that *for this analysis* all the methods used give the same 
> answer doesn't invalidate the general point that the different
> methods do different things and that some might be better than
> others.
> 
> I guess I'm curious what conclusions you're drawing from this
> exercise:
> 
> * "am I doing something wrong, or missing something obvious"? - I
> don't think so. * "why does everyone make such a fuss about the
> differences"? - Because they sometimes (just not in this case) have a
> big effect on the conclusions
> 
> 
> On 13 Jan 2011, at 16:04, Ben Bolker wrote:
> 
> On 11-01-13 10:13 AM, Chris Mcowen wrote:
>> Hi Ben,
>> 
>> Thanks for the reply:
>> 
>> The Pagel test showed that there is a strong phylogentic signal in
>> my data.
>> 
>> I don't believe i used
>>> reasonably minor differences in the way that you account for 
>>> correlation
>> 
>> as i used a method without any account for the correlation  i.e
>> GLM, which selected the same model set  and the same "best" model
>> as GLMM the other "phylogentic" methods.
>> 
>> I guess my central point is i spent a while researching methods to 
>> deal with phylogenetic structure, and there are various schools of 
>> thought of the best method. Some say using random effects in GLMM 
>> models is not capable of dealing with phylogentic structure where
>> as others suggest the CAIC method is the best as it actually uses
>> the tree. So from these points of view there is actually a
>> considerable difference in the method used?
>> 
>> If you look at the model set "selected" based on AIC differences
>> by each method, they are the same, therefore is any method really
>> better for this type of investigation than another?
> 
> OK, "relatively minor" was an overstatement (sorry I overlooked the 
> fact that you used a GLM as well).  You can have a strong
> phylogenetic signal; that still (as you have demonstrated) doesn't
> mean that it will overturn the conclusions based on a
> non-phylogenetic analysis.
> 
> The fact that *for this analysis* all the methods used give the same 
> answer doesn't invalidate the general point that the different
> methods do different things and that some might be better than
> others.
> 
> I guess I'm curious what conclusions you're drawing from this
> exercise:
> 
> * "am I doing something wrong, or missing something obvious"? - I
> don't think so. * "why does everyone make such a fuss about the
> differences"? - Because they sometimes (just not in this case) have a
> big effect on the conclusions
> 
> Ben Bolker
> 
>> 
>> Chris
>> 
>> I think your comparison is really worthwhile, but I don't see why 
>> this is surprising at all -- perhaps I'm missing something.  Given 
>> that you have reasonably strong effects of your predictors, 
>> reasonably minor differences in the way that you account for 
>> correlation won't change the qualitative conclusion
>> 
>> On 13 Jan 2011, at 15:00, Ben Bolker wrote:
>> 
>> On 11-01-13 04:38 AM, Chris Mcowen wrote:
>> 
>>> Dear list,
>> 
>>> I am modelling the effect of various life history traits of 
>>> species against their extinction rating.
>> 
>>> My data has a phylogentic signal (see table below) so to be 
>>> statistically correct i worked with phylogentic independent 
>>> contrasts
>> 
>> 
>>> From Purvis et al., 2000 "phylogenetic analyses were necessary 
>>> because of the pseudoreplication and, hence, elevated type I
>>> error rates that result from treating species as independent
>>> points when relevant variables show a phylogenetic pattern"
>> 
>>> Variable                                     » ( Pagel) IUCN
>>> extinction risk         0.47 Breeding system 0.99 Endosperm
>>> 0.96 Floral symmetry                    0.93 Fruit
>>> 0.97 Pollen dispersal                     0.99 Seasonality
>>> 0.52 Storage organ                      0.85 Woodyness 0.61
>> 
>>> There are various ways of doing this one method is to use the 
>>> package CAIC (generated using compara- tive analysis by 
>>> independent contrasts)  which utilizes the phylogeny to generate 
>>> the independent contrasts. However as i was using it i found
>>> this: from Sodhi et al., 2008
>> 
>>>> It was necessary to decompose the variance across species by 
>>>> coding the random-effects error structure of the GLMM as a 
>>>> hierarchical taxonomic (class/order/ family) effect (Blackburn
>>>> & Duncan, 2001). We had insufficient replication within genera
>>>> to include the genus in the nested random effect. Our method is
>>>> more appropriate than the independent-contrasts approach
>>>> (Purvis et al., 2000) in situations where a complete phylogeny
>>>> of the study taxon is unavailable, when categorical variables
>>>> are included in the analysis, and when model selection, rather
>>>> than hypothesis testing, is the statistical paradigm being
>>>> used.
>> 
>>> So i gave this a go - using GLMM and setting the order / family
>>> as random effects.
>> 
>>> I then came across mcmcGLMM which allows the use of a
>>> phylogentic tree to deal with the phylogenetic structure of the
>>> data set.
>> 
>>> So i gave this a go as well!
>> 
>>> Finally to be consistent i ran the model with no phylogenetic 
>>> control - using GLM.
>> 
>>> My criteria for model selection was that of Burnham and Anderson
>>> - using AIC and AICML etc to select the "most likely" set of
>>> models.
>> 
>>> Interestingly i found that using all four methods the same model 
>>> (based on AIC difference) was selected as the most likely? 
>>> Furthermore, the pattern of AIC differences across the models 
>>> reflected each other.
>> 
>> I think your comparison is really worthwhile, but I don't see why 
>> this is surprising at all -- perhaps I'm missing something.  Given 
>> that you have reasonably strong effects of your predictors, 
>> reasonably minor differences in the way that you account for 
>> correlation won't change the qualitative conclusions.
>> 
>> Ben Bolker
>> 
> 
> _______________________________________________ R-sig-phylo mailing
> list R-sig-phylo at r-project.org 
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



More information about the R-sig-phylo mailing list