[R-sig-ME] Ordinal data analysis with MCMCglmm
Jarrod Hadfield
j.hadfield at ed.ac.uk
Thu Feb 12 10:36:16 CET 2015
Hi Pierre,
Exactly. There's two ways to think about GLMM on categorical data. To
take binary data and a probit link function, the probability of
success is:
pnorm(eta,0,1)
where eta is the linear predictor. In family="ordinal" the linear predictor is
Xb+Zu+e
and the residual e is Gaussian with variance V.
Another way to think about it is a success happens when
Xb+Zu+e+epsilon > 0
where epsilon is from the unit normal (mean=0, variance=1)
e and epsilon are confounded and e+epsilon is Gaussian with zero mean
and variance V+1. The variance is not identifiable and has to be
fixed, and so fixing V=1 gives the variance of e+epsilon as 2. The
probability of success can also be written as:
pnorm(Xb+Zu,0,sqrt(V+1))
It might be tempting to set V=0 because then we end up with the
`standard' probit regression:
pnorm(Xb+Zu,0,sqrt(1))
but MCMCglmm would complain because mixing requires V>0. This isn't a
big deal because the location effects can be rescaled to what they
would be under the standard parameterisation:
pnorm(Xb+Zu,0,sqrt(2)) = pnorm((Xb+Zu)/sqrt(2),0,1)
family="threshold" is pretty much the same except VAR(e) is set to
zero implicitly and V in the prior designates VAR(epsilon). Setting it
to one gives the standard parameterisation directly.
Cheers,
Jarrod
Quoting Pierre de Villemereuil <bonamy at horus.ens.fr> on Thu, 12 Feb
2015 10:09:41 +0100:
> Hi!
>
> Sorry, I'm inviting myself in the conversation. Jarrod, could you be more
> specific on the difference between "ordinal" and "threshold"?
>
> I gather from you last updated "MCMCglmm Course Notes" that "threshold" uses
> the "latent residual variance" directly in the link function, whereas
> "ordinal" adds another variance of 1, is that right?
>
> Thank you in advance for the clarification.
>
> Cheers,
> Pierre.
>
> Le mercredi 11 février 2015, 15:25:27 Jarrod Hadfield a écrit :
>> Hi,
>>
>> The residual variance is not identifiable in the likelihood for
>> ordinal models - you need to fix it at something (e.g. 1):
>>
>> prior=list(R=list(V=1, fix=1), ...)
>>
>> Note that under family="ordinal" this is equivalent to probit
>> regression but with a variance of 2 (rather than the standard 1) in
>> the normal cumulative distribution function (i.e. pnorm).
>>
>> family="threhold" fits the same model but fixing the unit variance at
>> 1 gives the standard probit regression. This implementation samples
>> the posterior more efficiently too.
>>
>> Cheers,
>>
>> Jarrod
>>
>>
>> Quoting Luka Rubinjoni <rubinjoni at gmail.com> on Wed, 11 Feb 2015
>>
>> 15:13:59 +0100:
>> > Dear list members,
>> >
>> > I'm trying to analyze ordinal data with MCMCglmm, but I keep getting
>> >
>> > inconsistent results. I'm a biologist with no experience in Bayesian
>> > inference and mixed models, these are my first attempts...
>> >
>> > I'm exploring herbivory load on a dioecious plant (separate male and
>> >
>> > female sex). I used leaf damage as an estimate of herbivory load, by
>> > assigning each leaf with a 1 (no damage) to 4 (heavy/severe damage)
>> > score. I have sampled about 500 plants from 12 locations, and have about
>> > 10 leaves per plant. Plants do not have a constant number of leaves, it
>> > ranges from 6 to 17. All plants were collected from natural habitats -
>> > there was no experimental setup for a full factorial design. Also,
>> > uneven number of plants was sampled from various locations.
>> >
>> > I wish to determine if one of the sexes (male or female) is more
>> >
>> > susceptible to herbivore damage. I would also like to compare sites
>> > according to the observed damage, and explore sex*site interaction.
>> > Further, I would like to see if some other plant traits (except sex)
>> > have an impact on herbivory: plant height, plant mass, leaf number, leaf
>> > mass, flower/fruit mass & number, reproductive allocation...
>> >
>> > I tried running MCMCglmm procedure with the default prior (without
>> >
>> > declaring the prior), but I get awkward results (negative DICs, large
>> > differences in DICs between runs...). I'm not sure if I'm making any
>> > other mistakes, too. Do you have any suggestions for the prior
>> > specification?
>> >
>> > Best regards,
>> > Luka Rubinjoni
>> > Faculty of Biology, University of Belgrade
>> >
>> > P.S. Here's summary of the data, and output from one of the models:
>> >> summary(skorovi)
>> >>
>> > ID LeafID score N
>> >
>> > K24M : 17 Min. : 1 1:2248 Min. : 6.00
>> > M9M : 17 1st Qu.:1480 2:2982 1st Qu.:10.00
>> > J10M : 16 Median :2960 3: 564 Median :11.00
>> > K18M : 16 Mean :2960 4: 124 Mean :11.35
>> > M17F : 16 3rd Qu.:4439 3rd Qu.:12.00
>> > M19M : 16 Max. :5918 Max. :17.00
>> > (Other):5820
>> >
>> > PlantH FlowerN InternodeN
>> >
>> > Min. : 94.0 Min. : 1.0 Min. : 4.000
>> > 1st Qu.:204.0 1st Qu.: 5.0 1st Qu.: 5.000
>> > Median :231.0 Median :11.0 Median : 6.000
>> > Mean :233.7 Mean :15.6 Mean : 6.131
>> > 3rd Qu.:265.0 3rd Qu.:24.0 3rd Qu.: 7.000
>> > Max. :398.0 Max. :68.0 Max. :10.000
>> >
>> > PlantM LeafM Sex Site
>> >
>> > Min. :0.1690 Min. :0.1170 F:2891 A2S : 875
>> > 1st Qu.:0.4760 1st Qu.:0.2950 M:3027 A : 805
>> > Median :0.6120 Median :0.3910 K : 784
>> > Mean :0.6816 Mean :0.4316 M : 566
>> > 3rd Qu.:0.8500 3rd Qu.:0.5310 DJ : 479
>> > Max. :2.3020 Max. :1.5610 P : 454
>> >
>> > (Other):1955
>> >
>> > RA InfloresenceN FlowerM
>> >
>> > Min. :0.001348 Min. :1.000 Min. :0.00100
>> > 1st Qu.:0.011674 1st Qu.:4.000 1st Qu.:0.00700
>> > Median :0.024943 Median :4.000 Median :0.01700
>> > Mean :0.065849 Mean :4.492 Mean :0.04613
>> > 3rd Qu.:0.097990 3rd Qu.:5.000 3rd Qu.:0.05600
>> > Max. :0.838617 Max. :9.000 Max. :0.56100
>> >
>> > NA's :2891
>> >
>> > CMC200kIDSite <- MCMCglmm(score~Sex, ~ID+Site, rcov=~units,
>> > family="ordinal", data=skorovi, DIC=TRUE, nitt=200000, thin=100,
>> > burnin=5000)
>> >
>> > ...
>> >
>> > MCMC iteration = 200000
>> >
>> > Acceptance ratio for latent scores = 0.897869
>> >
>> > Acceptance ratio for cutpoint set 1 = 0.779000
>> >
>> >> summary(MCMC200kIDSite)
>> >>
>> > Iterations = 5001:199901
>> > Thinning interval = 100
>> > Sample size = 1950
>> >
>> > DIC: -9100.507
>> >
>> > G-structure: ~ID
>> >
>> > post.mean l-95% CI u-95% CI eff.samp
>> >
>> > ID 12.02 0.3165 24.33 1.436
>> >
>> > ~Site
>> >
>> > post.mean l-95% CI u-95% CI eff.samp
>> >
>> > Site 17.89 0.1927 52.11 5.023
>> >
>> > R-structure: ~units
>> >
>> > post.mean l-95% CI u-95% CI eff.samp
>> >
>> > units 40.44 0.198 80.42 1.478
>> >
>> > Location effects: score ~ Sex
>> >
>> > post.mean l-95% CI u-95% CI eff.samp pMCMC
>> >
>> > (Intercept) 2.38948 0.01827 5.96931 5.605 0.0359 *
>> > SexM 0.11478 -0.64723 0.90102 1950.000 0.7426
>> > ---
>> > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>> >
>> > Cutpoints:
>> > post.mean l-95% CI u-95% CI eff.samp
>> >
>> > cutpoint.traitscore.1 10.74 2.051 16.73 1.347
>> > cutpoint.traitscore.2 16.79 3.203 26.18 1.323
>> >
>> >> autocorr(MCMC200kIDSite$Sol)
>> >
>> > , , (Intercept)
>> >
>> > (Intercept) SexM
>> >
>> > Lag 0 1.0000000 0.02834661
>> > Lag 100 0.4652665 0.12773192
>> > Lag 500 0.4637719 0.12885704
>> > Lag 1000 0.4608056 0.10518225
>> > Lag 5000 0.4426579 0.10261122
>> >
>> > , , SexM
>> >
>> > (Intercept) SexM
>> >
>> > Lag 0 0.02834661 1.000000000
>> > Lag 100 0.10172481 0.000939134
>> > Lag 500 0.13601706 0.003533253
>> > Lag 1000 0.13280788 0.040197635
>> > Lag 5000 0.11814053 0.070899772
>> >
>> >> autocorr(MCMC200kIDSite$VCV)
>> >
>> > , , ID
>> >
>> > ID Site units
>> >
>> > Lag 0 1.0000000 0.7125238 0.9873699
>> > Lag 100 0.9770221 0.7118302 0.9862788
>> > Lag 500 0.9692561 0.7054513 0.9799585
>> > Lag 1000 0.9644127 0.6951924 0.9737191
>> > Lag 5000 0.9104725 0.6701422 0.9193135
>> >
>> > , , Site
>> >
>> > ID Site units
>> >
>> > Lag 0 0.7125238 1.0000000 0.7214728
>> > Lag 100 0.7095693 0.4868717 0.7170873
>> > Lag 500 0.7028373 0.5282474 0.7136912
>> > Lag 1000 0.7045606 0.5254454 0.7081390
>> > Lag 5000 0.6686113 0.4824315 0.6739685
>> >
>> > , , units
>> >
>> > ID Site units
>> >
>> > Lag 0 0.9873699 0.7214728 1.0000000
>> > Lag 100 0.9864973 0.7192773 0.9971748
>> > Lag 500 0.9810497 0.7137308 0.9910396
>> > Lag 1000 0.9743296 0.7063965 0.9838989
>> > Lag 5000 0.9213108 0.6722353 0.9303135
>> >
>> > _______________________________________________
>> > R-sig-mixed-models at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the R-sig-mixed-models
mailing list