[R-sig-ME] Unclear output from MCMCglmm with categorical predictors

Mon Nov 12 13:26:52 CET 2018

Thanks for your response Walid.
However, it doesn't really answer my questions, so I'll try to clarify those.

An intercept estimate in my model makes no biological sense because  
all species have both diet and habitat. This is why I chose to  
suppress the intercept.

In the context of categorical predictors, suppressing the intercept   
means that an arbitrarily-chosen category is taken as baseline, and  
the model estimates the difference (in the response variable) between  
this baseline and all other categories.
When dealing with two categorical traits as predictors, each data  
point is a combination of the effects of both traits, i.e. it is a  
combination of one category from each trait.

If this is indeed the case, the baseline value must include one  
category of each predictor in the model, otherwise their effects would  
be confounded.

For example, in my model I would expect one diet category AND one  
habitat category, say, 'herbivorous' and 'aquatic', to be  
baseline/intercept (un-estimated).
Instead, it seems that only 'herbivorous' is absorbed in the  
intercept, while all habitat classes have posterior estimates.

My questions are:
1- why does no habitat category get absorbed in the intercept? and
2- what can I do to fix that?

If it's at all useful, here is an example:

## data format
unique(Tdata$Habitat)
[1] TE AR AQ ST SA
Levels: AQ AR SA ST TE

unique(Tdata$Diet)
[1] Herb Omni Faun
Levels: Faun Herb Omni

ModTHRE <- MCMCglmm(Response ~ -1 + Habitat + Diet,
                     prior = list(R = list(V = 1, nu = 0.002)),
                     ginverse = list(Binomial=INphylo$Ainv),
                     burnin = 25000, nitt = 225000, thin = 200,
                     family = "threshold",
                     data = Tdata,
                     verbose = FALSE)

## model output
summary(ModTHRE)

  Iterations = 25001:224801
  Thinning interval  = 200
  Sample size  = 1000

  DIC: 2321.861

  R-structure:  ~units

       post.mean l-95% CI u-95% CI eff.samp
units         1   0.9973    1.003     1000

  Location effects: Response ~ -1 + ForagingHab + Troph_Lev

             post.mean l-95% CI u-95% CI eff.samp  pMCMC
HabitatST   -0.07612 -0.42629  0.33746   1000.0  0.694
HabitatTE   -0.36682 -0.52533 -0.21118   1000.0 <0.001 ***
HabitatSA   -0.04691 -0.25769  0.19636   1000.0  0.724
HabitatAR   -0.07302 -0.29900  0.16681    883.3  0.548
HabitatAQ   -0.47948 -0.82598 -0.15793   1000.0  0.002 **
DietHerb     0.30708  0.11255  0.48545   1000.0  0.002 **
DietOmni     0.05263 -0.12176  0.26105   1000.0  0.604
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

  Cutpoints:
                          post.mean l-95% CI u-95% CI eff.samp
cutpoint.traitActivity.1    0.4911    0.436   0.5495    877.6

Any thoughts, ideas of what's going on here, or corrections of my  
mis-perceptions, are all very welcome.

Many thanks,
Roi

Quoting Walid Mawass <walidmawass10 using gmail.com>:

> Hello,
>
> by adding -1 into your model's formula, you are explicitly calling for no
> intercept term to be included in the estimates. If you do want an intercept
> term which would include a baseline level for each of your categorical
> variables then the syntax should be:
>
> MCMCglmm(Activity ~1 + Habitat + Diet + log(Mass) + Max.Temp... (explicit
> call for intercept term)
> or
> MCMCglmm(Activity ~ Habitat + Diet + log(Mass) + Max.Temp... (implicit call
> for intercept term)
>
> Hope this helps
> --
> Walid Mawass
> Ph.D. candidate in Cellular and Molecular Biology
> Population Genetics Laboratory
> University of Québec at Trois-Rivières
> 3351, boul. des Forges, C.P. 500
> Trois-Rivières (Québec) G9A 5H7
> Telephone: 819-376-5011 poste 3384
>
>
> On Wed, Nov 7, 2018 at 12:09 PM <roimaor using post.tau.ac.il> wrote:
>
>> Dear list members,
>>
>> I have a model with categorical response and categorical + continuous
>> predictors.
>>
>> My model has two categorical predictors: "diet" (3 levels) and
>> "habitat" (6 levels):
>>
>> THRE1 <- MCMCglmm(Activity ~ -1 + Habitat + Diet + log(Mass) + Max.Temp,
>>                       prior = list(R = list(V = 1, fix = 1)),
>>                       ginverse = list(Binomial=INphylo$Ainv),
>>                       family = "threshold",
>>                       data = Tdata)
>>
>> If I understand correctly, in this configuration the algorithm
>> shouldn't return estimated values for the effect of each level of a
>> categorical predictor, instead, it returns a contrast between that
>> level and another level which was arbitrarily chosen as the base
>> level. Each species (data point) has a value for each of these traits,
>> so I would expect them to be estimated independently, meaning that one
>> level of each predictor should be the 'baseline' and absorbed into the
>> global intercept. In that case I expect 2 contrasts to be returned for
>> diet categories and 5 contrasts for habitat.
>>
>> However, I get 2 estimates (presumably contrasts) for diet categories,
>> and 6 for habitat categories, i.e., no habitat category was designated
>> as baseline, which makes me question whether the estimates are
>> contrasts or actual effect sizes.
>>
>> My questions:
>> - Is the algorithm pooling all the predictor categories as if they
>> were a single trait with 8 levels?
>> - If the habitat effect estimates are contrasts - what are they compared
>> to?
>> - If they are effect sizes - what did I do to not get the contrasts as
>> I expected?
>>
>> Any help would be much appreciated!
>> Thanks,
>> --
>> Roi Maor
>> PhD candidate
>> School of Zoology, Tel Aviv University
>> Centre for Biodiversity and Environment Research, UCL
>>
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>