[R-sig-ME] Unclear output from MCMCglmm with categorical predictors

Wed Nov 14 18:22:24 CET 2018

Hi Jarrod,
Thanks so much for the helpful comments.
I was just about to check why models with and without the ginverse  
argument were giving similar estimates when a strong phylogenetic  
signal is expected... Much appreciated!
And thanks again Walid- I now understand what you meant in your answer.

All the best,
Roi

Quoting HADFIELD Jarrod <j.hadfield using ed.ac.uk>:

> Hi,
>
> Sorry - I also noted some errors in your example, and the original code.
>
> 1/ In the example you do not fix the residual variance (to one)  
> which you should do because it is not identifiable in threshold  
> models.
>
> 2/ You have what seems to be a phylogenetic correlation structure  
> passed to ginverse, but you don’t associate it with a random term.
>
> Cheers,
>
> Jarrod
>
>> On 12 Nov 2018, at 12:26, roimaor using post.tau.ac.il wrote:
>>
>>
>> Thanks for your response Walid.
>> However, it doesn't really answer my questions, so I'll try to  
>> clarify those.
>>
>> An intercept estimate in my model makes no biological sense because  
>> all species have both diet and habitat. This is why I chose to  
>> suppress the intercept.
>>
>> In the context of categorical predictors, suppressing the intercept  
>>  means that an arbitrarily-chosen category is taken as baseline,  
>> and the model estimates the difference (in the response variable)  
>> between this baseline and all other categories.
>> When dealing with two categorical traits as predictors, each data  
>> point is a combination of the effects of both traits, i.e. it is a  
>> combination of one category from each trait.
>>
>> If this is indeed the case, the baseline value must include one  
>> category of each predictor in the model, otherwise their effects  
>> would be confounded.
>>
>> For example, in my model I would expect one diet category AND one  
>> habitat category, say, 'herbivorous' and 'aquatic', to be  
>> baseline/intercept (un-estimated).
>> Instead, it seems that only 'herbivorous' is absorbed in the  
>> intercept, while all habitat classes have posterior estimates.
>>
>> My questions are:
>> 1- why does no habitat category get absorbed in the intercept? and
>> 2- what can I do to fix that?
>>
>> If it's at all useful, here is an example:
>>
>> ## data format
>> unique(Tdata$Habitat)
>> [1] TE AR AQ ST SA
>> Levels: AQ AR SA ST TE
>>
>> unique(Tdata$Diet)
>> [1] Herb Omni Faun
>> Levels: Faun Herb Omni
>>
>> ModTHRE <- MCMCglmm(Response ~ -1 + Habitat + Diet,
>>                    prior = list(R = list(V = 1, nu = 0.002)),
>>                    ginverse = list(Binomial=INphylo$Ainv),
>>                    burnin = 25000, nitt = 225000, thin = 200,
>>                    family = "threshold",
>>                    data = Tdata,
>>                    verbose = FALSE)
>>
>> ## model output
>> summary(ModTHRE)
>>
>> Iterations = 25001:224801
>> Thinning interval  = 200
>> Sample size  = 1000
>>
>> DIC: 2321.861
>>
>> R-structure:  ~units
>>
>>      post.mean l-95% CI u-95% CI eff.samp
>> units         1   0.9973    1.003     1000
>>
>> Location effects: Response ~ -1 + ForagingHab + Troph_Lev
>>
>>            post.mean l-95% CI u-95% CI eff.samp  pMCMC
>> HabitatST   -0.07612 -0.42629  0.33746   1000.0  0.694
>> HabitatTE   -0.36682 -0.52533 -0.21118   1000.0 <0.001 ***
>> HabitatSA   -0.04691 -0.25769  0.19636   1000.0  0.724
>> HabitatAR   -0.07302 -0.29900  0.16681    883.3  0.548
>> HabitatAQ   -0.47948 -0.82598 -0.15793   1000.0  0.002 **
>> DietHerb     0.30708  0.11255  0.48545   1000.0  0.002 **
>> DietOmni     0.05263 -0.12176  0.26105   1000.0  0.604
>> ---
>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>
>> Cutpoints:
>>                         post.mean l-95% CI u-95% CI eff.samp
>> cutpoint.traitActivity.1    0.4911    0.436   0.5495    877.6
>>
>>
>>
>> Any thoughts, ideas of what's going on here, or corrections of my  
>> mis-perceptions, are all very welcome.
>>
>> Many thanks,
>> Roi
>>
>>
>> Quoting Walid Mawass <walidmawass10 using gmail.com>:
>>
>>> Hello,
>>>
>>> by adding -1 into your model's formula, you are explicitly calling for no
>>> intercept term to be included in the estimates. If you do want an intercept
>>> term which would include a baseline level for each of your categorical
>>> variables then the syntax should be:
>>>
>>> MCMCglmm(Activity ~1 + Habitat + Diet + log(Mass) + Max.Temp... (explicit
>>> call for intercept term)
>>> or
>>> MCMCglmm(Activity ~ Habitat + Diet + log(Mass) + Max.Temp... (implicit call
>>> for intercept term)
>>>
>>> Hope this helps
>>> --
>>> Walid Mawass
>>> Ph.D. candidate in Cellular and Molecular Biology
>>> Population Genetics Laboratory
>>> University of Québec at Trois-Rivières
>>> 3351, boul. des Forges, C.P. 500
>>> Trois-Rivières (Québec) G9A 5H7
>>> Telephone: 819-376-5011 poste 3384
>>>
>>>
>>> On Wed, Nov 7, 2018 at 12:09 PM <roimaor using post.tau.ac.il> wrote:
>>>
>>>> Dear list members,
>>>>
>>>> I have a model with categorical response and categorical + continuous
>>>> predictors.
>>>>
>>>> My model has two categorical predictors: "diet" (3 levels) and
>>>> "habitat" (6 levels):
>>>>
>>>> THRE1 <- MCMCglmm(Activity ~ -1 + Habitat + Diet + log(Mass) + Max.Temp,
>>>>                      prior = list(R = list(V = 1, fix = 1)),
>>>>                      ginverse = list(Binomial=INphylo$Ainv),
>>>>                      family = "threshold",
>>>>                      data = Tdata)
>>>>
>>>> If I understand correctly, in this configuration the algorithm
>>>> shouldn't return estimated values for the effect of each level of a
>>>> categorical predictor, instead, it returns a contrast between that
>>>> level and another level which was arbitrarily chosen as the base
>>>> level. Each species (data point) has a value for each of these traits,
>>>> so I would expect them to be estimated independently, meaning that one
>>>> level of each predictor should be the 'baseline' and absorbed into the
>>>> global intercept. In that case I expect 2 contrasts to be returned for
>>>> diet categories and 5 contrasts for habitat.
>>>>
>>>> However, I get 2 estimates (presumably contrasts) for diet categories,
>>>> and 6 for habitat categories, i.e., no habitat category was designated
>>>> as baseline, which makes me question whether the estimates are
>>>> contrasts or actual effect sizes.
>>>>
>>>> My questions:
>>>> - Is the algorithm pooling all the predictor categories as if they
>>>> were a single trait with 8 levels?
>>>> - If the habitat effect estimates are contrasts - what are they compared
>>>> to?
>>>> - If they are effect sizes - what did I do to not get the contrasts as
>>>> I expected?
>>>>
>>>> Any help would be much appreciated!
>>>> Thanks,
>>>> --
>>>> Roi Maor
>>>> PhD candidate
>>>> School of Zoology, Tel Aviv University
>>>> Centre for Biodiversity and Environment Research, UCL
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models using r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.