[R-sig-ME] Unclear output from MCMCglmm with categorical predictors
j@h@dfield @ending from ed@@c@uk
Mon Nov 12 20:35:39 CET 2018
Sorry - I also noted some errors in your example, and the original code.
1/ In the example you do not fix the residual variance (to one) which you should do because it is not identifiable in threshold models.
2/ You have what seems to be a phylogenetic correlation structure passed to ginverse, but you don’t associate it with a random term.
> On 12 Nov 2018, at 12:26, roimaor using post.tau.ac.il wrote:
> Thanks for your response Walid.
> However, it doesn't really answer my questions, so I'll try to clarify those.
> An intercept estimate in my model makes no biological sense because all species have both diet and habitat. This is why I chose to suppress the intercept.
> In the context of categorical predictors, suppressing the intercept means that an arbitrarily-chosen category is taken as baseline, and the model estimates the difference (in the response variable) between this baseline and all other categories.
> When dealing with two categorical traits as predictors, each data point is a combination of the effects of both traits, i.e. it is a combination of one category from each trait.
> If this is indeed the case, the baseline value must include one category of each predictor in the model, otherwise their effects would be confounded.
> For example, in my model I would expect one diet category AND one habitat category, say, 'herbivorous' and 'aquatic', to be baseline/intercept (un-estimated).
> Instead, it seems that only 'herbivorous' is absorbed in the intercept, while all habitat classes have posterior estimates.
> My questions are:
> 1- why does no habitat category get absorbed in the intercept? and
> 2- what can I do to fix that?
> If it's at all useful, here is an example:
> ## data format
>  TE AR AQ ST SA
> Levels: AQ AR SA ST TE
>  Herb Omni Faun
> Levels: Faun Herb Omni
> ModTHRE <- MCMCglmm(Response ~ -1 + Habitat + Diet,
> prior = list(R = list(V = 1, nu = 0.002)),
> ginverse = list(Binomial=INphylo$Ainv),
> burnin = 25000, nitt = 225000, thin = 200,
> family = "threshold",
> data = Tdata,
> verbose = FALSE)
> ## model output
> Iterations = 25001:224801
> Thinning interval = 200
> Sample size = 1000
> DIC: 2321.861
> R-structure: ~units
> post.mean l-95% CI u-95% CI eff.samp
> units 1 0.9973 1.003 1000
> Location effects: Response ~ -1 + ForagingHab + Troph_Lev
> post.mean l-95% CI u-95% CI eff.samp pMCMC
> HabitatST -0.07612 -0.42629 0.33746 1000.0 0.694
> HabitatTE -0.36682 -0.52533 -0.21118 1000.0 <0.001 ***
> HabitatSA -0.04691 -0.25769 0.19636 1000.0 0.724
> HabitatAR -0.07302 -0.29900 0.16681 883.3 0.548
> HabitatAQ -0.47948 -0.82598 -0.15793 1000.0 0.002 **
> DietHerb 0.30708 0.11255 0.48545 1000.0 0.002 **
> DietOmni 0.05263 -0.12176 0.26105 1000.0 0.604
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> post.mean l-95% CI u-95% CI eff.samp
> cutpoint.traitActivity.1 0.4911 0.436 0.5495 877.6
> Any thoughts, ideas of what's going on here, or corrections of my mis-perceptions, are all very welcome.
> Many thanks,
> Quoting Walid Mawass <walidmawass10 using gmail.com>:
>> by adding -1 into your model's formula, you are explicitly calling for no
>> intercept term to be included in the estimates. If you do want an intercept
>> term which would include a baseline level for each of your categorical
>> variables then the syntax should be:
>> MCMCglmm(Activity ~1 + Habitat + Diet + log(Mass) + Max.Temp... (explicit
>> call for intercept term)
>> MCMCglmm(Activity ~ Habitat + Diet + log(Mass) + Max.Temp... (implicit call
>> for intercept term)
>> Hope this helps
>> Walid Mawass
>> Ph.D. candidate in Cellular and Molecular Biology
>> Population Genetics Laboratory
>> University of Québec at Trois-Rivières
>> 3351, boul. des Forges, C.P. 500
>> Trois-Rivières (Québec) G9A 5H7
>> Telephone: 819-376-5011 poste 3384
>> On Wed, Nov 7, 2018 at 12:09 PM <roimaor using post.tau.ac.il> wrote:
>>> Dear list members,
>>> I have a model with categorical response and categorical + continuous
>>> My model has two categorical predictors: "diet" (3 levels) and
>>> "habitat" (6 levels):
>>> THRE1 <- MCMCglmm(Activity ~ -1 + Habitat + Diet + log(Mass) + Max.Temp,
>>> prior = list(R = list(V = 1, fix = 1)),
>>> ginverse = list(Binomial=INphylo$Ainv),
>>> family = "threshold",
>>> data = Tdata)
>>> If I understand correctly, in this configuration the algorithm
>>> shouldn't return estimated values for the effect of each level of a
>>> categorical predictor, instead, it returns a contrast between that
>>> level and another level which was arbitrarily chosen as the base
>>> level. Each species (data point) has a value for each of these traits,
>>> so I would expect them to be estimated independently, meaning that one
>>> level of each predictor should be the 'baseline' and absorbed into the
>>> global intercept. In that case I expect 2 contrasts to be returned for
>>> diet categories and 5 contrasts for habitat.
>>> However, I get 2 estimates (presumably contrasts) for diet categories,
>>> and 6 for habitat categories, i.e., no habitat category was designated
>>> as baseline, which makes me question whether the estimates are
>>> contrasts or actual effect sizes.
>>> My questions:
>>> - Is the algorithm pooling all the predictor categories as if they
>>> were a single trait with 8 levels?
>>> - If the habitat effect estimates are contrasts - what are they compared
>>> - If they are effect sizes - what did I do to not get the contrasts as
>>> I expected?
>>> Any help would be much appreciated!
>>> Roi Maor
>>> PhD candidate
>>> School of Zoology, Tel Aviv University
>>> Centre for Biodiversity and Environment Research, UCL
>>> R-sig-mixed-models using r-project.org mailing list
> R-sig-mixed-models using r-project.org mailing list
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the R-sig-mixed-models