[R-sig-ME] Unclear output from MCMCglmm with categorical predictors
roim@or m@ili@g off post@t@u@@c@il
roim@or m@ili@g off post@t@u@@c@il
Wed Nov 14 18:22:24 CET 2018
Hi Jarrod,
Thanks so much for the helpful comments.
I was just about to check why models with and without the ginverse
argument were giving similar estimates when a strong phylogenetic
signal is expected... Much appreciated!
And thanks again Walid- I now understand what you meant in your answer.
All the best,
Roi
Quoting HADFIELD Jarrod <j.hadfield using ed.ac.uk>:
> Hi,
>
> Sorry - I also noted some errors in your example, and the original code.
>
> 1/ In the example you do not fix the residual variance (to one)
> which you should do because it is not identifiable in threshold
> models.
>
> 2/ You have what seems to be a phylogenetic correlation structure
> passed to ginverse, but you don’t associate it with a random term.
>
> Cheers,
>
> Jarrod
>
>> On 12 Nov 2018, at 12:26, roimaor using post.tau.ac.il wrote:
>>
>>
>> Thanks for your response Walid.
>> However, it doesn't really answer my questions, so I'll try to
>> clarify those.
>>
>> An intercept estimate in my model makes no biological sense because
>> all species have both diet and habitat. This is why I chose to
>> suppress the intercept.
>>
>> In the context of categorical predictors, suppressing the intercept
>> means that an arbitrarily-chosen category is taken as baseline,
>> and the model estimates the difference (in the response variable)
>> between this baseline and all other categories.
>> When dealing with two categorical traits as predictors, each data
>> point is a combination of the effects of both traits, i.e. it is a
>> combination of one category from each trait.
>>
>> If this is indeed the case, the baseline value must include one
>> category of each predictor in the model, otherwise their effects
>> would be confounded.
>>
>> For example, in my model I would expect one diet category AND one
>> habitat category, say, 'herbivorous' and 'aquatic', to be
>> baseline/intercept (un-estimated).
>> Instead, it seems that only 'herbivorous' is absorbed in the
>> intercept, while all habitat classes have posterior estimates.
>>
>> My questions are:
>> 1- why does no habitat category get absorbed in the intercept? and
>> 2- what can I do to fix that?
>>
>> If it's at all useful, here is an example:
>>
>> ## data format
>> unique(Tdata$Habitat)
>> [1] TE AR AQ ST SA
>> Levels: AQ AR SA ST TE
>>
>> unique(Tdata$Diet)
>> [1] Herb Omni Faun
>> Levels: Faun Herb Omni
>>
>> ModTHRE <- MCMCglmm(Response ~ -1 + Habitat + Diet,
>> prior = list(R = list(V = 1, nu = 0.002)),
>> ginverse = list(Binomial=INphylo$Ainv),
>> burnin = 25000, nitt = 225000, thin = 200,
>> family = "threshold",
>> data = Tdata,
>> verbose = FALSE)
>>
>> ## model output
>> summary(ModTHRE)
>>
>> Iterations = 25001:224801
>> Thinning interval = 200
>> Sample size = 1000
>>
>> DIC: 2321.861
>>
>> R-structure: ~units
>>
>> post.mean l-95% CI u-95% CI eff.samp
>> units 1 0.9973 1.003 1000
>>
>> Location effects: Response ~ -1 + ForagingHab + Troph_Lev
>>
>> post.mean l-95% CI u-95% CI eff.samp pMCMC
>> HabitatST -0.07612 -0.42629 0.33746 1000.0 0.694
>> HabitatTE -0.36682 -0.52533 -0.21118 1000.0 <0.001 ***
>> HabitatSA -0.04691 -0.25769 0.19636 1000.0 0.724
>> HabitatAR -0.07302 -0.29900 0.16681 883.3 0.548
>> HabitatAQ -0.47948 -0.82598 -0.15793 1000.0 0.002 **
>> DietHerb 0.30708 0.11255 0.48545 1000.0 0.002 **
>> DietOmni 0.05263 -0.12176 0.26105 1000.0 0.604
>> ---
>> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>
>> Cutpoints:
>> post.mean l-95% CI u-95% CI eff.samp
>> cutpoint.traitActivity.1 0.4911 0.436 0.5495 877.6
>>
>>
>>
>> Any thoughts, ideas of what's going on here, or corrections of my
>> mis-perceptions, are all very welcome.
>>
>> Many thanks,
>> Roi
>>
>>
>> Quoting Walid Mawass <walidmawass10 using gmail.com>:
>>
>>> Hello,
>>>
>>> by adding -1 into your model's formula, you are explicitly calling for no
>>> intercept term to be included in the estimates. If you do want an intercept
>>> term which would include a baseline level for each of your categorical
>>> variables then the syntax should be:
>>>
>>> MCMCglmm(Activity ~1 + Habitat + Diet + log(Mass) + Max.Temp... (explicit
>>> call for intercept term)
>>> or
>>> MCMCglmm(Activity ~ Habitat + Diet + log(Mass) + Max.Temp... (implicit call
>>> for intercept term)
>>>
>>> Hope this helps
>>> --
>>> Walid Mawass
>>> Ph.D. candidate in Cellular and Molecular Biology
>>> Population Genetics Laboratory
>>> University of Québec at Trois-Rivières
>>> 3351, boul. des Forges, C.P. 500
>>> Trois-Rivières (Québec) G9A 5H7
>>> Telephone: 819-376-5011 poste 3384
>>>
>>>
>>> On Wed, Nov 7, 2018 at 12:09 PM <roimaor using post.tau.ac.il> wrote:
>>>
>>>> Dear list members,
>>>>
>>>> I have a model with categorical response and categorical + continuous
>>>> predictors.
>>>>
>>>> My model has two categorical predictors: "diet" (3 levels) and
>>>> "habitat" (6 levels):
>>>>
>>>> THRE1 <- MCMCglmm(Activity ~ -1 + Habitat + Diet + log(Mass) + Max.Temp,
>>>> prior = list(R = list(V = 1, fix = 1)),
>>>> ginverse = list(Binomial=INphylo$Ainv),
>>>> family = "threshold",
>>>> data = Tdata)
>>>>
>>>> If I understand correctly, in this configuration the algorithm
>>>> shouldn't return estimated values for the effect of each level of a
>>>> categorical predictor, instead, it returns a contrast between that
>>>> level and another level which was arbitrarily chosen as the base
>>>> level. Each species (data point) has a value for each of these traits,
>>>> so I would expect them to be estimated independently, meaning that one
>>>> level of each predictor should be the 'baseline' and absorbed into the
>>>> global intercept. In that case I expect 2 contrasts to be returned for
>>>> diet categories and 5 contrasts for habitat.
>>>>
>>>> However, I get 2 estimates (presumably contrasts) for diet categories,
>>>> and 6 for habitat categories, i.e., no habitat category was designated
>>>> as baseline, which makes me question whether the estimates are
>>>> contrasts or actual effect sizes.
>>>>
>>>> My questions:
>>>> - Is the algorithm pooling all the predictor categories as if they
>>>> were a single trait with 8 levels?
>>>> - If the habitat effect estimates are contrasts - what are they compared
>>>> to?
>>>> - If they are effect sizes - what did I do to not get the contrasts as
>>>> I expected?
>>>>
>>>> Any help would be much appreciated!
>>>> Thanks,
>>>> --
>>>> Roi Maor
>>>> PhD candidate
>>>> School of Zoology, Tel Aviv University
>>>> Centre for Biodiversity and Environment Research, UCL
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models using r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
More information about the R-sig-mixed-models
mailing list