[R-sig-ME] Unclear output from MCMCglmm with categorical predictors
roim@or m@ili@g off post@t@u@@c@il
roim@or m@ili@g off post@t@u@@c@il
Mon Nov 12 13:26:52 CET 2018
Thanks for your response Walid.
However, it doesn't really answer my questions, so I'll try to clarify those.
An intercept estimate in my model makes no biological sense because
all species have both diet and habitat. This is why I chose to
suppress the intercept.
In the context of categorical predictors, suppressing the intercept
means that an arbitrarily-chosen category is taken as baseline, and
the model estimates the difference (in the response variable) between
this baseline and all other categories.
When dealing with two categorical traits as predictors, each data
point is a combination of the effects of both traits, i.e. it is a
combination of one category from each trait.
If this is indeed the case, the baseline value must include one
category of each predictor in the model, otherwise their effects would
be confounded.
For example, in my model I would expect one diet category AND one
habitat category, say, 'herbivorous' and 'aquatic', to be
baseline/intercept (un-estimated).
Instead, it seems that only 'herbivorous' is absorbed in the
intercept, while all habitat classes have posterior estimates.
My questions are:
1- why does no habitat category get absorbed in the intercept? and
2- what can I do to fix that?
If it's at all useful, here is an example:
## data format
unique(Tdata$Habitat)
[1] TE AR AQ ST SA
Levels: AQ AR SA ST TE
unique(Tdata$Diet)
[1] Herb Omni Faun
Levels: Faun Herb Omni
ModTHRE <- MCMCglmm(Response ~ -1 + Habitat + Diet,
prior = list(R = list(V = 1, nu = 0.002)),
ginverse = list(Binomial=INphylo$Ainv),
burnin = 25000, nitt = 225000, thin = 200,
family = "threshold",
data = Tdata,
verbose = FALSE)
## model output
summary(ModTHRE)
Iterations = 25001:224801
Thinning interval = 200
Sample size = 1000
DIC: 2321.861
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 0.9973 1.003 1000
Location effects: Response ~ -1 + ForagingHab + Troph_Lev
post.mean l-95% CI u-95% CI eff.samp pMCMC
HabitatST -0.07612 -0.42629 0.33746 1000.0 0.694
HabitatTE -0.36682 -0.52533 -0.21118 1000.0 <0.001 ***
HabitatSA -0.04691 -0.25769 0.19636 1000.0 0.724
HabitatAR -0.07302 -0.29900 0.16681 883.3 0.548
HabitatAQ -0.47948 -0.82598 -0.15793 1000.0 0.002 **
DietHerb 0.30708 0.11255 0.48545 1000.0 0.002 **
DietOmni 0.05263 -0.12176 0.26105 1000.0 0.604
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Cutpoints:
post.mean l-95% CI u-95% CI eff.samp
cutpoint.traitActivity.1 0.4911 0.436 0.5495 877.6
Any thoughts, ideas of what's going on here, or corrections of my
mis-perceptions, are all very welcome.
Many thanks,
Roi
Quoting Walid Mawass <walidmawass10 using gmail.com>:
> Hello,
>
> by adding -1 into your model's formula, you are explicitly calling for no
> intercept term to be included in the estimates. If you do want an intercept
> term which would include a baseline level for each of your categorical
> variables then the syntax should be:
>
> MCMCglmm(Activity ~1 + Habitat + Diet + log(Mass) + Max.Temp... (explicit
> call for intercept term)
> or
> MCMCglmm(Activity ~ Habitat + Diet + log(Mass) + Max.Temp... (implicit call
> for intercept term)
>
> Hope this helps
> --
> Walid Mawass
> Ph.D. candidate in Cellular and Molecular Biology
> Population Genetics Laboratory
> University of Québec at Trois-Rivières
> 3351, boul. des Forges, C.P. 500
> Trois-Rivières (Québec) G9A 5H7
> Telephone: 819-376-5011 poste 3384
>
>
> On Wed, Nov 7, 2018 at 12:09 PM <roimaor using post.tau.ac.il> wrote:
>
>> Dear list members,
>>
>> I have a model with categorical response and categorical + continuous
>> predictors.
>>
>> My model has two categorical predictors: "diet" (3 levels) and
>> "habitat" (6 levels):
>>
>> THRE1 <- MCMCglmm(Activity ~ -1 + Habitat + Diet + log(Mass) + Max.Temp,
>> prior = list(R = list(V = 1, fix = 1)),
>> ginverse = list(Binomial=INphylo$Ainv),
>> family = "threshold",
>> data = Tdata)
>>
>> If I understand correctly, in this configuration the algorithm
>> shouldn't return estimated values for the effect of each level of a
>> categorical predictor, instead, it returns a contrast between that
>> level and another level which was arbitrarily chosen as the base
>> level. Each species (data point) has a value for each of these traits,
>> so I would expect them to be estimated independently, meaning that one
>> level of each predictor should be the 'baseline' and absorbed into the
>> global intercept. In that case I expect 2 contrasts to be returned for
>> diet categories and 5 contrasts for habitat.
>>
>> However, I get 2 estimates (presumably contrasts) for diet categories,
>> and 6 for habitat categories, i.e., no habitat category was designated
>> as baseline, which makes me question whether the estimates are
>> contrasts or actual effect sizes.
>>
>> My questions:
>> - Is the algorithm pooling all the predictor categories as if they
>> were a single trait with 8 levels?
>> - If the habitat effect estimates are contrasts - what are they compared
>> to?
>> - If they are effect sizes - what did I do to not get the contrasts as
>> I expected?
>>
>> Any help would be much appreciated!
>> Thanks,
>> --
>> Roi Maor
>> PhD candidate
>> School of Zoology, Tel Aviv University
>> Centre for Biodiversity and Environment Research, UCL
>>
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
More information about the R-sig-mixed-models
mailing list