[R-sig-ME] Ordinal data analysis with MCMCglmm

Jarrod Hadfield j.hadfield at ed.ac.uk
Thu Feb 12 10:36:16 CET 2015


Hi Pierre,

Exactly. There's two ways to think about GLMM on categorical data. To  
take binary data and a probit link function, the probability of  
success is:

pnorm(eta,0,1)

where eta is the linear predictor. In family="ordinal" the linear predictor is

Xb+Zu+e

and the residual e is Gaussian with variance V.

Another way to think about it is a success happens when

Xb+Zu+e+epsilon > 0

where epsilon is from the unit normal (mean=0, variance=1)

e and epsilon are confounded and e+epsilon is Gaussian with zero mean  
and variance V+1.  The variance is not identifiable and has to be  
fixed, and so fixing V=1 gives the variance of e+epsilon as 2. The  
probability of success can also be written as:

pnorm(Xb+Zu,0,sqrt(V+1))

It might be tempting to set V=0 because then we end up with the  
`standard' probit regression:

pnorm(Xb+Zu,0,sqrt(1))

but MCMCglmm would complain because mixing requires V>0. This isn't a  
big deal because the location effects can be rescaled to what they  
would be under the standard parameterisation:

pnorm(Xb+Zu,0,sqrt(2)) = pnorm((Xb+Zu)/sqrt(2),0,1)

family="threshold" is pretty much the same except VAR(e) is set to  
zero implicitly and V in the prior designates VAR(epsilon). Setting it  
to one gives the standard parameterisation directly.

Cheers,

Jarrod























Quoting Pierre de Villemereuil <bonamy at horus.ens.fr> on Thu, 12 Feb  
2015 10:09:41 +0100:

> Hi!
>
> Sorry, I'm inviting myself in the conversation. Jarrod, could you be more
> specific on the difference between "ordinal" and "threshold"?
>
> I gather from you last updated "MCMCglmm Course Notes" that "threshold" uses
> the "latent residual variance" directly in the link function, whereas
> "ordinal" adds another variance of 1, is that right?
>
> Thank you in advance for the clarification.
>
> Cheers,
> Pierre.
>
> Le mercredi 11 février 2015, 15:25:27 Jarrod Hadfield a écrit :
>> Hi,
>>
>> The residual variance is not identifiable in the likelihood for
>> ordinal models - you need to fix it at something (e.g. 1):
>>
>> prior=list(R=list(V=1, fix=1), ...)
>>
>> Note that under family="ordinal" this is equivalent to probit
>> regression but with a variance of 2 (rather than the standard 1) in
>> the normal cumulative distribution function (i.e. pnorm).
>>
>> family="threhold" fits the same model but fixing the unit variance at
>> 1 gives the standard probit regression. This implementation samples
>> the posterior more efficiently too.
>>
>> Cheers,
>>
>> Jarrod
>>
>>
>> Quoting Luka Rubinjoni <rubinjoni at gmail.com> on Wed, 11 Feb 2015
>>
>> 15:13:59 +0100:
>> > Dear list members,
>> >
>> > 	I'm trying to analyze ordinal data with MCMCglmm, but I keep getting
>> >
>> > inconsistent results. I'm a biologist with no experience in Bayesian
>> > inference and mixed models, these are my first attempts...
>> >
>> > 	I'm exploring herbivory load on a dioecious plant (separate male and
>> >
>> > female sex). I used leaf damage as an estimate of herbivory load, by
>> > assigning each leaf with a 1 (no damage) to 4 (heavy/severe damage)
>> > score. I have sampled about 500 plants from 12 locations, and have about
>> > 10 leaves per plant. Plants do not have a constant number of leaves, it
>> > ranges from 6 to 17. All plants were collected from natural habitats -
>> > there was no experimental setup for a full factorial design. Also,
>> > uneven number of plants was sampled from various locations.
>> >
>> > 	I wish to determine if one of the sexes (male or female) is more
>> >
>> > susceptible to herbivore damage. I would also like to compare sites
>> > according to the observed damage, and explore sex*site interaction.
>> > Further, I would like to see if some other plant traits (except sex)
>> > have an impact on herbivory: plant height, plant mass, leaf number, leaf
>> > mass, flower/fruit mass & number, reproductive allocation...
>> >
>> > 	I tried running MCMCglmm procedure with the default prior (without
>> >
>> > declaring the prior), but I get awkward results (negative DICs, large
>> > differences in DICs between runs...). I'm not sure if I'm making any
>> > other mistakes, too. Do you have any suggestions for the prior
>> > specification?
>> >
>> > Best regards,
>> > Luka Rubinjoni
>> > Faculty of Biology, University of Belgrade
>> >
>> > P.S. Here's summary of the data, and output from one of the models:
>> >> summary(skorovi)
>> >>
>> >        ID           LeafID     score          N
>> >
>> >  K24M   :  17   Min.   :   1   1:2248   Min.   : 6.00
>> >  M9M    :  17   1st Qu.:1480   2:2982   1st Qu.:10.00
>> >  J10M   :  16   Median :2960   3: 564   Median :11.00
>> >  K18M   :  16   Mean   :2960   4: 124   Mean   :11.35
>> >  M17F   :  16   3rd Qu.:4439            3rd Qu.:12.00
>> >  M19M   :  16   Max.   :5918            Max.   :17.00
>> >  (Other):5820
>> >
>> >      PlantH         FlowerN       InternodeN
>> >
>> >  Min.   : 94.0   Min.   : 1.0   Min.   : 4.000
>> >  1st Qu.:204.0   1st Qu.: 5.0   1st Qu.: 5.000
>> >  Median :231.0   Median :11.0   Median : 6.000
>> >  Mean   :233.7   Mean   :15.6   Mean   : 6.131
>> >  3rd Qu.:265.0   3rd Qu.:24.0   3rd Qu.: 7.000
>> >  Max.   :398.0   Max.   :68.0   Max.   :10.000
>> >
>> >      PlantM           LeafM        Sex           Site
>> >
>> >  Min.   :0.1690   Min.   :0.1170   F:2891   A2S    : 875
>> >  1st Qu.:0.4760   1st Qu.:0.2950   M:3027   A      : 805
>> >  Median :0.6120   Median :0.3910            K      : 784
>> >  Mean   :0.6816   Mean   :0.4316            M      : 566
>> >  3rd Qu.:0.8500   3rd Qu.:0.5310            DJ     : 479
>> >  Max.   :2.3020   Max.   :1.5610            P      : 454
>> >
>> >                                             (Other):1955
>> >
>> >        RA           InfloresenceN      FlowerM
>> >
>> >  Min.   :0.001348   Min.   :1.000   Min.   :0.00100
>> >  1st Qu.:0.011674   1st Qu.:4.000   1st Qu.:0.00700
>> >  Median :0.024943   Median :4.000   Median :0.01700
>> >  Mean   :0.065849   Mean   :4.492   Mean   :0.04613
>> >  3rd Qu.:0.097990   3rd Qu.:5.000   3rd Qu.:0.05600
>> >  Max.   :0.838617   Max.   :9.000   Max.   :0.56100
>> >
>> >                     NA's   :2891
>> >
>> > CMC200kIDSite <- MCMCglmm(score~Sex, ~ID+Site, rcov=~units,
>> > family="ordinal", data=skorovi, DIC=TRUE, nitt=200000, thin=100,
>> > burnin=5000)
>> >
>> > ...
>> >
>> >  MCMC iteration = 200000
>> >
>> >   Acceptance ratio for latent scores = 0.897869
>> >
>> >  Acceptance ratio for cutpoint set 1 = 0.779000
>> >
>> >> summary(MCMC200kIDSite)
>> >>
>> >  Iterations = 5001:199901
>> >  Thinning interval  = 100
>> >  Sample size  = 1950
>> >
>> >  DIC: -9100.507
>> >
>> >  G-structure:  ~ID
>> >
>> >    post.mean l-95% CI u-95% CI eff.samp
>> >
>> > ID     12.02   0.3165    24.33    1.436
>> >
>> >                ~Site
>> >
>> >      post.mean l-95% CI u-95% CI eff.samp
>> >
>> > Site     17.89   0.1927    52.11    5.023
>> >
>> >  R-structure:  ~units
>> >
>> >       post.mean l-95% CI u-95% CI eff.samp
>> >
>> > units     40.44    0.198    80.42    1.478
>> >
>> >  Location effects: score ~ Sex
>> >
>> >             post.mean l-95% CI u-95% CI eff.samp  pMCMC
>> >
>> > (Intercept)   2.38948  0.01827  5.96931    5.605 0.0359 *
>> > SexM          0.11478 -0.64723  0.90102 1950.000 0.7426
>> > ---
>> > Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>> >
>> >  Cutpoints:
>> >                       post.mean l-95% CI u-95% CI eff.samp
>> >
>> > cutpoint.traitscore.1     10.74    2.051    16.73    1.347
>> > cutpoint.traitscore.2     16.79    3.203    26.18    1.323
>> >
>> >> autocorr(MCMC200kIDSite$Sol)
>> >
>> > , , (Intercept)
>> >
>> >          (Intercept)       SexM
>> >
>> > Lag 0      1.0000000 0.02834661
>> > Lag 100    0.4652665 0.12773192
>> > Lag 500    0.4637719 0.12885704
>> > Lag 1000   0.4608056 0.10518225
>> > Lag 5000   0.4426579 0.10261122
>> >
>> > , , SexM
>> >
>> >          (Intercept)        SexM
>> >
>> > Lag 0     0.02834661 1.000000000
>> > Lag 100   0.10172481 0.000939134
>> > Lag 500   0.13601706 0.003533253
>> > Lag 1000  0.13280788 0.040197635
>> > Lag 5000  0.11814053 0.070899772
>> >
>> >> autocorr(MCMC200kIDSite$VCV)
>> >
>> > , , ID
>> >
>> >                 ID      Site     units
>> >
>> > Lag 0    1.0000000 0.7125238 0.9873699
>> > Lag 100  0.9770221 0.7118302 0.9862788
>> > Lag 500  0.9692561 0.7054513 0.9799585
>> > Lag 1000 0.9644127 0.6951924 0.9737191
>> > Lag 5000 0.9104725 0.6701422 0.9193135
>> >
>> > , , Site
>> >
>> >                 ID      Site     units
>> >
>> > Lag 0    0.7125238 1.0000000 0.7214728
>> > Lag 100  0.7095693 0.4868717 0.7170873
>> > Lag 500  0.7028373 0.5282474 0.7136912
>> > Lag 1000 0.7045606 0.5254454 0.7081390
>> > Lag 5000 0.6686113 0.4824315 0.6739685
>> >
>> > , , units
>> >
>> >                 ID      Site     units
>> >
>> > Lag 0    0.9873699 0.7214728 1.0000000
>> > Lag 100  0.9864973 0.7192773 0.9971748
>> > Lag 500  0.9810497 0.7137308 0.9910396
>> > Lag 1000 0.9743296 0.7063965 0.9838989
>> > Lag 5000 0.9213108 0.6722353 0.9303135
>> >
>> > _______________________________________________
>> > R-sig-mixed-models at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



More information about the R-sig-mixed-models mailing list