[R-sig-ME] MCMCglmm with multinomial models

Wed Jul 20 09:05:44 CEST 2022

Hi Jess

In multinomial models the linear model is set up as a (logit) difference in probability between an outcome and some base-line outcome. Often, as here, the base-line outcome is arbitrary, and so the idh structure is a little odd. For example, if A is the base line category, idh assumes COV(B-A, C-A) = 0 which therefore assumes
COV(B,C)+VAR(A) =COV(A,B)+COV(C,A). It's not clear why this would be the case. Perhaps a more reasonable, but less parameter rich, option would be to have:

~idv(Section+trait:Section)

which parameterises the Section covariance matrix by a single parameter (rather than 6). The term idv(Section+trait:Section) fits a 3x3 covariance matrix of the form v*(I+J) where v is the estimated variance. This assumes i) Sections are repeatable in outcome, but knowing that a Section has an increased 'preference' for A doesn’t tell you whether it also has an increased preference for one of the other categories and ii) the repeatability for each outcome within sites is the same (on the latent scale).

To test groups of effects (in your case the 3 culling:trait effects), I usually use a Wald test and the posterior covariances (see here https://stat.ethz.ch/pipermail/r-sig-mixed-models/2017q3/025930.html). It's far from correct and so Walid's suggestions may be better, but small-scale simulations suggests it has good frequentist properties.

To add predator presence you can just add a predator:trait effect into the linear model. If the culling and predator factors do not vary within sites then you probably don't have enough information to reliably estimate these effects.

Cheers,

Jarrod

> On 19 Jul 2022, at 18:17, Walid Mawass <walidmawass10 using gmail.com> wrote:
>
> This email was sent to you by someone outside the University.
> You should only click on links or attachments if you are certain that the email is genuine and the content is safe.
>
> Hey Jess,
>
> 1) Yes that is correct
>
> 2) To my knowledge there is a rule of thumb, where you set the nitt (# of
> iterations) to a large number that includes the burnin amount, then you
> choose your thinning interval (sampling of the chain). For example, this is
> what I would use: nitt= 150000, burnin=50000, thin=100. This will give you
> a decent burnin and a final sample of 1000 saved iterations. Note however
> that this does not have to increase the effective sample size for certain
> variables, but it might do the trick.
>
> 3) hmm...I think one way to do it is to make predictions using the above
> model and interpret the patterns you see for each relationship you are
> interested in. Another way to compare effect size would be to use bayesian
> posterior indices. I suggest these two papers by Makowski et al. (2019a &
> b) that present both interesting posterior indices to use with Bayesian
> statistical analysis and an associated R package that does the job of
> computing these indices, *bayestestR*.
>
> Good luck
> --
> Walid Mawass
> Ph.D. candidate in Evolutionary Biology - UQTR
> *Currently* Postdoctoral Research Associate
> Masel Lab - University of Arizona
>
>
> On Sun, Jul 17, 2022 at 11:32 PM jessica comley <jessiecomley44 using gmail.com>
> wrote:
>
>> Hi Walid,
>>
>> Thank you for your reply, I greatly appreciate it. I have a few more
>> questions and if you could help that would be great.
>>
>> I tested for correlation between activities and the 14 Sections and the
>> correlation comes out as low. Therefore I have changed my code to use idh()
>> instead of us as suggested:
>>
>> test1c.5b <- MCMCglmm(activity ~ -1 + at.level(culling,1):trait +
>> at.level(culling, 2):trait, random=~idh(trait):Section, rcov =
>> ~idh(trait):units, data = caracal, family = "categorical", prior = prior,
>> burnin=5000, nitt=80000)
>>
>> 1) Is this correct?
>>
>> 2) Increasing the number of interactions increases the effective sample
>> size, therefore is there a general rule of thumb as to how large your
>> effective sample size should be?
>>
>> 3) I understand how to use and interpret the results of HPDinterval (i.e.
>> if intervals do not overlap 0 then relationship is strong), but how am I
>> able to test the relationship between all four activities and fixed effects
>> and not just have the three categories (i.e. diurnal, dusk, nocturnal)
>> compared to the base category (dawn)? For example, I am also interested in
>> whether there is a significant/strong relationship between activities of
>> caracal at dusk with culling(Lethal)/no culling(none) compared to
>> activities of caracal at diurnal with culling(Lethal)/no culling(none).
>>
>> Below is an example of our dataset:
>> Camera Section CameraID Animal predator culling activity
>> 1a Bucklands Bucklands1a Caracal low Lethal diurnal
>> 1a Bucklands Bucklands1a Caracal low Lethal dawn
>> 2a Bucklands Bucklands2a Caracal low Lethal dusk
>> 2a Bucklands Bucklands2a Caracal low Lethal diurnal
>> 3a Bucklands Bucklands3a Caracal low Lethal dawn
>> Cam 1  Colchester ColchesterCam 1  Caracal high none diurnal
>> Cam 1  Colchester ColchesterCam 1  Caracal high none diurnal
>> Cam 1  Colchester ColchesterCam 1  Caracal high none diurnal
>> Cam 1  Colchester ColchesterCam 1  Caracal high none diurnal
>> Cam 2  Colchester ColchesterCam 2  Caracal high none diurnal
>> Cam 2  Colchester ColchesterCam 2  Caracal high none diurnal
>> Cam 3  Colchester ColchesterCam 3  Caracal high none diurnal
>> Cam 3  Colchester ColchesterCam 3  Caracal high none diurnal
>> Cam 3  Colchester ColchesterCam 3  Caracal high none diurnal
>> Cam 4  Colchester ColchesterCam 4  Caracal high none diurnal
>> Cam 4  Colchester ColchesterCam 4  Caracal high none diurnal
>> Cam 4  Colchester ColchesterCam 4  Caracal high none nocturnal
>> 1a Connaught Connaught1a Caracal low Lethal nocturnal
>> 1a Connaught Connaught1a Caracal low Lethal nocturnal
>> 1d Connaught Connaught1d Caracal low Lethal diurnal
>> 3B Connaught Connaught3B Caracal low Lethal diurnal
>> 3B Connaught Connaught3B Caracal low Lethal diurnal
>> 4a Connaught Connaught4a Caracal low Lethal nocturnal
>> 4a Connaught Connaught4a Caracal low Lethal nocturnal
>> 4b Connaught Connaught4b Caracal low Lethal diurnal
>> 6a Connaught Connaught6a Caracal low Lethal nocturnal
>> 6b Connaught Connaught6b Caracal low Lethal diurnal
>> 7a Connaught Connaught7a Caracal low Lethal nocturnal
>> 9a Connaught Connaught9a Caracal low Lethal nocturnal
>> 9d Connaught Connaught9d Caracal low Lethal nocturnal
>> 9d Connaught Connaught9d Caracal low Lethal dusk
>> 7d Diepdam Diepdam7d Caracal absent Lethal dusk
>> 8d Diepdam Diepdam8d Caracal absent Lethal diurnal
>> 9c Diepdam Diepdam9c Caracal absent Lethal nocturnal
>>
>> All the best,
>> Jess
>>
>>
>> On Fri, Jul 15, 2022 at 11:37 PM Walid Mawass <walidmawass10 using gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I don't think I can specifically help you with some of your inquiries.
>>> However, I do want to comment on a few things that might need some
>>> attention.
>>>
>>> First, MCMCglmm is based on a Bayesian implementation and does not
>>> compute p-values to compare. What you need to compare are the posterior
>>> distributions of your effect sizes. This can be done visually using the
>>> base plot function in R. Or by comparing the HPD intervals and the mode (or
>>> mean) of the posterior distributions.
>>>
>>> Second, I have no idea what your data structure looks like (which makes
>>> it hard to interpret model results), but the effective sample size (from
>>> the 5500 saved iterations sample) for your random variable Section is very
>>> low (the same applies for your fixed effects). You should consider this
>>> issue and look again at your assumption of correlation between
>>> activities for the 14 sections you have in your dataset. If you do not
>>> expect among activity correlations then you can use the idh() function
>>> instead of us().
>>>
>>> Hopefully this helps and in hope that people on this list with more
>>> knowledge of these models will help out.
>>>
>>> Best,
>>> --
>>> Walid Mawass
>>> Ph.D. candidate in Evolutionary Biology - UQTR
>>> *Currently* Postdoctoral Research Associate
>>> Masel Lab - University of Arizona
>>>
>>>
>>> On Fri, Jul 15, 2022 at 8:49 AM jessica comley <jessiecomley44 using gmail.com>
>>> wrote:
>>>
>>>> Dear all,
>>>>
>>>> I am hoping that someone will be able to help me with conducting MCMCglmm
>>>> multinomial models.
>>>>
>>>> The data I am working with is for black-backed jackal (bbj) and carcal.
>>>> For
>>>> each species we have a multinomial response variable called activity
>>>> which
>>>> has four categories (dawn, diurnal, dusk, nocturnal). We have two
>>>> categorical fixed effects which are 1) culling (none, lethal) and 2)
>>>> predator presence (absent, high, low). We also have a categorical
>>>> variable
>>>> called Section (made up of 14 different reserves/ farms where the
>>>> activity
>>>> of caracal and bbj were recorded). There are 273 observations for caracal
>>>> and 4399 for bbj. We are wanting to test the effects of culling and
>>>> predators on caracal and bbj activity separately.
>>>>
>>>> I have been working through Jarrod Hadfields course notes, particularly
>>>> with regards to Chapter 5.2. The chi-square analyses reveal that the
>>>> frequencies of culling and predators differ as do activities.
>>>>
>>>> I have managed to work out the specific probabilities for the culling
>>>> none
>>>> vs culling lethal for each activity (dawn, diurnal, dusk, nocturnal) for
>>>> caracal, but I'm confused as to how to determine p-values to determine
>>>> which activities culling none vs culling lethal are affecting?
>>>>
>>>> Myy code and outcomes are pasted below with questions stated in bold.
>>>>
>>>> caracal2 <- read.csv("caracal_new.csv", header=T)
>>>> caracal <- as.data.frame(unclass(caracal2), stringsAsFactors = TRUE)
>>>>
>>>> #Chi-squared tests
>>>> Ctable1 <- table(caracal$activity, caracal$culling)
>>>> chisq.test(rowSums(Ctable1)) #strongly suggests activities differ
>>>> chisq.test(Ctable1)#strongly suggests culling category differs
>>>>
>>>> Ctable2 <- table(caracal$activity, caracal$predator)
>>>> chisq.test(rowSums(Ctable2))#strongly suggests activities differ
>>>> chisq.test(Ctable2)#strongly suggests predator category differs
>>>>
>>>> prior = list(R = list(fix=1, V=(1/k) * (I + J)), G = list(G1=list(V =
>>>> diag(k-1), nu=1)))
>>>> test1c.5 <- MCMCglmm(activity ~ -1 + at.level(culling,1):trait +
>>>> at.level(culling, 2):trait, random=~us(trait):Section, rcov =
>>>> ~us(trait):units, data = caracal, family = "categorical", prior = prior,
>>>> burnin=5000, nitt=60000)
>>>> *##I'm not sure how to add the three predator levels to this model or if
>>>> it
>>>> would be appropriate?*
>>>>
>>>>
>>>> k <- length(levels(caracal$activity))
>>>> I <- diag(k-1)
>>>> J <- matrix(rep(1, (k-1)^2), c(k-1, k-1))
>>>> IJ <- (1/k) *(diag(k-1) + matrix(1,k-1, k-1))
>>>>
>>>> contrasts(caracal$activity)
>>>>
>>>> #culling lethal
>>>> Delta <- cbind(c(0,1,0,0), c(0,0,1,0), c(0,0,0,1))
>>>> c2 <- (16 * sqrt(3)/(15 * pi))^2
>>>> D <- ginv(Delta %*% t(Delta)) %*% Delta
>>>> Int <- t(apply(test1c.5$Sol[,1:3],1, function(x) + D %*% (x/sqrt(1 + c2 *
>>>> diag(IJ)))))
>>>> summary(mcmc(exp(Int)/rowSums(exp(Int))))
>>>>
>>>> prop.table(Ctable1[,1])
>>>>
>>>> #culling none
>>>> Delta <- cbind(c(0,1,0,0), c(0,0,1,0), c(0,0,0,1))
>>>> c2 <- (16 * sqrt(3)/(15 * pi))^2
>>>> D <- ginv(Delta %*% t(Delta)) %*% Delta
>>>> Int <- t(apply(test1c.5$Sol[,4:6],1, function(x) + D %*% (x/sqrt(1 + c2 *
>>>> diag(IJ)))))
>>>> summary(mcmc(exp(Int)/rowSums(exp(Int))))
>>>>
>>>> prop.table((Ctable1[,2]))
>>>>
>>>> HPDinterval(test1c.5$Sol)
>>>>
>>>> #model summary
>>>>> summary(test1c.5)
>>>>
>>>> Iterations = 5001:59991
>>>> Thinning interval  = 10
>>>> Sample size  = 5500
>>>>
>>>> DIC: 699.7014
>>>>
>>>> G-structure:  ~us(trait):Section
>>>>
>>>>                                                        post.mean l-95%
>>>> CI
>>>> u-95% CI eff.samp
>>>> traitactivity.diurnal:traitactivity.diurnal.Section        1.8124
>>>> 0.09784
>>>>   5.665    77.01
>>>> traitactivity.dusk:traitactivity.diurnal.Section           0.8450
>>>> -0.83585
>>>>   3.856    64.17
>>>> traitactivity.nocturnal:traitactivity.diurnal.Section      1.3621
>>>> -1.19129
>>>>   6.157    58.48
>>>> traitactivity.diurnal:traitactivity.dusk.Section           0.8450
>>>> -0.83585
>>>>   3.856    64.17
>>>> traitactivity.dusk:traitactivity.dusk.Section              1.2034
>>>> 0.07090
>>>>   3.681   102.16
>>>> traitactivity.nocturnal:traitactivity.dusk.Section         0.7505
>>>> -1.77113
>>>>   4.524    43.53
>>>> traitactivity.diurnal:traitactivity.nocturnal.Section      1.3621
>>>> -1.19129
>>>>   6.157    58.48
>>>> traitactivity.dusk:traitactivity.nocturnal.Section         0.7505
>>>> -1.77113
>>>>   4.524    43.53
>>>> traitactivity.nocturnal:traitactivity.nocturnal.Section    2.7148
>>>> 0.09401
>>>>   8.397    76.59
>>>>
>>>> R-structure:  ~us(trait):units
>>>>
>>>>                                                      post.mean l-95% CI
>>>> u-95% CI eff.samp
>>>> traitactivity.diurnal:traitactivity.diurnal.units          0.50     0.50
>>>>  0.50        0
>>>> traitactivity.dusk:traitactivity.diurnal.units             0.25     0.25
>>>>  0.25        0
>>>> traitactivity.nocturnal:traitactivity.diurnal.units        0.25     0.25
>>>>  0.25        0
>>>> traitactivity.diurnal:traitactivity.dusk.units             0.25     0.25
>>>>  0.25        0
>>>> traitactivity.dusk:traitactivity.dusk.units                0.50     0.50
>>>>  0.50        0
>>>> traitactivity.nocturnal:traitactivity.dusk.units           0.25     0.25
>>>>  0.25        0
>>>> traitactivity.diurnal:traitactivity.nocturnal.units        0.25     0.25
>>>>  0.25        0
>>>> traitactivity.dusk:traitactivity.nocturnal.units           0.25     0.25
>>>>  0.25        0
>>>> traitactivity.nocturnal:traitactivity.nocturnal.units      0.50     0.50
>>>>  0.50        0
>>>>
>>>> Location effects: activity ~ -1 + at.level(culling, 1):trait +
>>>> at.level(culling, 2):trait
>>>>
>>>>                                             post.mean l-95% CI u-95% CI
>>>> eff.samp  pMCMC
>>>> at.level(culling, 1):traitactivity.diurnal      1.2306  -0.0533   2.6793
>>>> 145.29 0.0418 *
>>>> at.level(culling, 1):traitactivity.dusk         0.6605  -0.6006   2.0761
>>>> 92.91 0.2840
>>>> at.level(culling, 1):traitactivity.nocturnal    1.6090   0.0914   3.1356
>>>> 151.02 0.0265 *
>>>> traitactivity.diurnal:at.level(culling, 2)      1.2664  -0.1552   2.7750
>>>> 226.40 0.0604 .
>>>> traitactivity.dusk:at.level(culling, 2)         0.3533  -0.9898   1.5218
>>>> 148.44 0.5447
>>>> traitactivity.nocturnal:at.level(culling, 2)    1.0447  -0.6405   2.8354
>>>> 346.40 0.1618
>>>> ---
>>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>>>
>>>> *##So for the model summary I get that lethal culling at activity diurnal
>>>> is significantly different from lethal culling at dawn (its the base
>>>> reference), but I'm also interested in whether lethal culling at activity
>>>> diurnal is different from lethal culling at dusk for example. Is this
>>>> possible? *
>>>>
>>>> #outcomes culling lethal
>>>>> summary(mcmc(exp(Int)/rowSums(exp(Int))))
>>>>
>>>> Iterations = 1:5500
>>>> Thinning interval = 1
>>>> Number of chains = 1
>>>> Sample size per chain = 5500
>>>>
>>>> 1. Empirical mean and standard deviation for each variable,
>>>>   plus standard error of the mean:
>>>>
>>>>       Mean      SD  Naive SE Time-series SE
>>>> [1,] 0.1253 0.05565 0.0007504       0.002484
>>>> [2,] 0.3748 0.10497 0.0014155       0.003204
>>>> [3,] 0.1757 0.06640 0.0008954       0.002515
>>>> [4,] 0.3242 0.11939 0.0016099       0.003514
>>>>
>>>> 2. Quantiles for each variable:
>>>>
>>>>        2.5%     25%    50%    75%  97.5%
>>>> var1 0.03641 0.08695 0.1198 0.1554 0.2553
>>>> var2 0.17298 0.30580 0.3704 0.4431 0.5896
>>>> var3 0.06166 0.12913 0.1705 0.2161 0.3215
>>>> var4 0.12610 0.23999 0.3090 0.3901 0.6045
>>>>
>>>>> prop.table(Ctable1[,1])
>>>>     dawn   diurnal      dusk nocturnal
>>>> 0.1250000 0.2812500 0.1770833 0.4166667
>>>>
>>>>
>>>> #outcomes culling none
>>>>> summary(mcmc(exp(Int)/rowSums(exp(Int))))
>>>>
>>>> Iterations = 1:5500
>>>> Thinning interval = 1
>>>> Number of chains = 1
>>>> Sample size per chain = 5500
>>>>
>>>> 1. Empirical mean and standard deviation for each variable,
>>>>   plus standard error of the mean:
>>>>
>>>>       Mean      SD  Naive SE Time-series SE
>>>> [1,] 0.1288 0.06141 0.0008280       0.002787
>>>> [2,] 0.3804 0.10406 0.0014032       0.002662
>>>> [3,] 0.1710 0.06844 0.0009228       0.002592
>>>> [4,] 0.3198 0.11812 0.0015928       0.002956
>>>>
>>>> 2. Quantiles for each variable:
>>>>
>>>>        2.5%     25%    50%    75%  97.5%
>>>> var1 0.02891 0.08896 0.1220 0.1594 0.2685
>>>> var2 0.18007 0.31094 0.3783 0.4474 0.5965
>>>> var3 0.05840 0.12425 0.1634 0.2083 0.3250
>>>> var4 0.12430 0.23921 0.3077 0.3862 0.5964
>>>>
>>>>> prop.table((Ctable1[,2]))
>>>>     dawn   diurnal      dusk nocturnal
>>>> 0.1306818 0.4375000 0.1875000 0.2443182
>>>>
>>>> Any help or guidance will be greatly appreciated.
>>>>
>>>> All the best,
>>>> Jess
>>>>
>>>> --
>>>> Jessica Comley (PhD)
>>>> Research Scientist
>>>>
>>>>        [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models using r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>>
>>
>> --
>> Jessica Comley (PhD)
>> Research Scientist
>>
>>
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.