[R-sig-ME] Unclear output from MCMCglmm with categorical predictors

Mon Nov 12 20:35:39 CET 2018

Hi, 

Sorry - I also noted some errors in your example, and the original code. 

1/ In the example you do not fix the residual variance (to one) which you should do because it is not identifiable in threshold models.

2/ You have what seems to be a phylogenetic correlation structure passed to ginverse, but you don’t associate it with a random term.

Cheers,

Jarrod

> On 12 Nov 2018, at 12:26, roimaor using post.tau.ac.il wrote:
> 
> 
> Thanks for your response Walid.
> However, it doesn't really answer my questions, so I'll try to clarify those.
> 
> An intercept estimate in my model makes no biological sense because all species have both diet and habitat. This is why I chose to suppress the intercept.
> 
> In the context of categorical predictors, suppressing the intercept  means that an arbitrarily-chosen category is taken as baseline, and the model estimates the difference (in the response variable) between this baseline and all other categories.
> When dealing with two categorical traits as predictors, each data point is a combination of the effects of both traits, i.e. it is a combination of one category from each trait.
> 
> If this is indeed the case, the baseline value must include one category of each predictor in the model, otherwise their effects would be confounded.
> 
> For example, in my model I would expect one diet category AND one habitat category, say, 'herbivorous' and 'aquatic', to be baseline/intercept (un-estimated).
> Instead, it seems that only 'herbivorous' is absorbed in the intercept, while all habitat classes have posterior estimates.
> 
> My questions are:
> 1- why does no habitat category get absorbed in the intercept? and
> 2- what can I do to fix that?
> 
> If it's at all useful, here is an example:
> 
> ## data format
> unique(Tdata$Habitat)
> [1] TE AR AQ ST SA
> Levels: AQ AR SA ST TE
> 
> unique(Tdata$Diet)
> [1] Herb Omni Faun
> Levels: Faun Herb Omni
> 
> ModTHRE <- MCMCglmm(Response ~ -1 + Habitat + Diet,
>                    prior = list(R = list(V = 1, nu = 0.002)),
>                    ginverse = list(Binomial=INphylo$Ainv),
>                    burnin = 25000, nitt = 225000, thin = 200,
>                    family = "threshold",
>                    data = Tdata,
>                    verbose = FALSE)
> 
> ## model output
> summary(ModTHRE)
> 
> Iterations = 25001:224801
> Thinning interval  = 200
> Sample size  = 1000
> 
> DIC: 2321.861
> 
> R-structure:  ~units
> 
>      post.mean l-95% CI u-95% CI eff.samp
> units         1   0.9973    1.003     1000
> 
> Location effects: Response ~ -1 + ForagingHab + Troph_Lev
> 
>            post.mean l-95% CI u-95% CI eff.samp  pMCMC
> HabitatST   -0.07612 -0.42629  0.33746   1000.0  0.694
> HabitatTE   -0.36682 -0.52533 -0.21118   1000.0 <0.001 ***
> HabitatSA   -0.04691 -0.25769  0.19636   1000.0  0.724
> HabitatAR   -0.07302 -0.29900  0.16681    883.3  0.548
> HabitatAQ   -0.47948 -0.82598 -0.15793   1000.0  0.002 **
> DietHerb     0.30708  0.11255  0.48545   1000.0  0.002 **
> DietOmni     0.05263 -0.12176  0.26105   1000.0  0.604
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> Cutpoints:
>                         post.mean l-95% CI u-95% CI eff.samp
> cutpoint.traitActivity.1    0.4911    0.436   0.5495    877.6
> 
> 
> 
> Any thoughts, ideas of what's going on here, or corrections of my mis-perceptions, are all very welcome.
> 
> Many thanks,
> Roi
> 
> 
> Quoting Walid Mawass <walidmawass10 using gmail.com>:
> 
>> Hello,
>> 
>> by adding -1 into your model's formula, you are explicitly calling for no
>> intercept term to be included in the estimates. If you do want an intercept
>> term which would include a baseline level for each of your categorical
>> variables then the syntax should be:
>> 
>> MCMCglmm(Activity ~1 + Habitat + Diet + log(Mass) + Max.Temp... (explicit
>> call for intercept term)
>> or
>> MCMCglmm(Activity ~ Habitat + Diet + log(Mass) + Max.Temp... (implicit call
>> for intercept term)
>> 
>> Hope this helps
>> --
>> Walid Mawass
>> Ph.D. candidate in Cellular and Molecular Biology
>> Population Genetics Laboratory
>> University of Québec at Trois-Rivières
>> 3351, boul. des Forges, C.P. 500
>> Trois-Rivières (Québec) G9A 5H7
>> Telephone: 819-376-5011 poste 3384
>> 
>> 
>> On Wed, Nov 7, 2018 at 12:09 PM <roimaor using post.tau.ac.il> wrote:
>> 
>>> Dear list members,
>>> 
>>> I have a model with categorical response and categorical + continuous
>>> predictors.
>>> 
>>> My model has two categorical predictors: "diet" (3 levels) and
>>> "habitat" (6 levels):
>>> 
>>> THRE1 <- MCMCglmm(Activity ~ -1 + Habitat + Diet + log(Mass) + Max.Temp,
>>>                      prior = list(R = list(V = 1, fix = 1)),
>>>                      ginverse = list(Binomial=INphylo$Ainv),
>>>                      family = "threshold",
>>>                      data = Tdata)
>>> 
>>> If I understand correctly, in this configuration the algorithm
>>> shouldn't return estimated values for the effect of each level of a
>>> categorical predictor, instead, it returns a contrast between that
>>> level and another level which was arbitrarily chosen as the base
>>> level. Each species (data point) has a value for each of these traits,
>>> so I would expect them to be estimated independently, meaning that one
>>> level of each predictor should be the 'baseline' and absorbed into the
>>> global intercept. In that case I expect 2 contrasts to be returned for
>>> diet categories and 5 contrasts for habitat.
>>> 
>>> However, I get 2 estimates (presumably contrasts) for diet categories,
>>> and 6 for habitat categories, i.e., no habitat category was designated
>>> as baseline, which makes me question whether the estimates are
>>> contrasts or actual effect sizes.
>>> 
>>> My questions:
>>> - Is the algorithm pooling all the predictor categories as if they
>>> were a single trait with 8 levels?
>>> - If the habitat effect estimates are contrasts - what are they compared
>>> to?
>>> - If they are effect sizes - what did I do to not get the contrasts as
>>> I expected?
>>> 
>>> Any help would be much appreciated!
>>> Thanks,
>>> --
>>> Roi Maor
>>> PhD candidate
>>> School of Zoology, Tel Aviv University
>>> Centre for Biodiversity and Environment Research, UCL
>>> 
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>> 
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.