# [R-sig-ME] Mixed-model polytomous ordered logistic regression ?

Rune Haubo rhbc at imm.dtu.dk
Mon Jan 3 19:54:19 CET 2011

```Hi Emmanuel,

You can use the clmm() function in package ordinal to fit cumulative
link mixed models which include the proportional odds mixed model that
you mention. The non-mixed equivalent, clm() gives the same answers as
polr() and glm() on your example data.

clmm() currently only allows for one random intercept, but Ken
Knoblauch proposed a polmer() function
(http://finzi.psych.upenn.edu/R-sig-mixed-models/2010q2/003778.html)
that seem to be an extension of glmer() similar to your glm() hack,
but for mixed models. This allows for several random effects in a
glmer() fashion. On the other hand, with clmm() you get a variety of
link functions and extensions to scale and nominal effects for
modelling non-proportional effects, structured thresholds etc.

The idea of reformulating the cumulative link model as a binomial
generalized linear model is good, but I haven't seen it described
before - can I ask you which exercise in what edition of Gelman & Hill
(2007) that mentions this?

Cheers,
Rune

On 30 December 2010 23:27, Emmanuel Charpentier <emm.charpentier at free.fr> wrote:
> Dear list,
>
> Following a hint of an exercise in Gelman & Hill's (2007) regression
> textbook, I tried to understand  how to build a function generalizing to
> mixed models what polr() (MASS) does for fixed models.
>
> I started with a very simple artificial dataset (see at end). Using the
> simplest polr() use :
>
>> summary(polr(Cat~X, data=Data))
>
> Re-fitting to get Hessian
>
> Call:
> polr(formula = Cat ~ X, data = Data)
>
> Coefficients:
>  Value Std. Error t value
> X 9.557      1.821   5.247
>
> Intercepts:
>      Value   Std. Error t value
> C1|C2 14.8668  3.0447     4.8828
> C2|C3 24.2772  4.7119     5.1523
> C3|C4 34.2367  6.5861     5.1984
> C4|C5 43.3543  8.2390     5.2621
> C5|C6 53.8174 10.3240     5.2128
> C6|C7 63.5247 12.2352     5.1920
> C7|C8 72.5850 13.7899     5.2636
> C8|C9 82.2256 15.8144     5.1994
>
> Residual Deviance: 55.7395
> AIC: 73.7395
> Message d'avis :
> glm.fit: fitted probabilities numerically 0 or 1 occurred
>
> the following snippet :
>
> Glm.polr<-local({
>  ll<-length(lev<-levels(Data\$Cat))
>  thr<-paste(lev[-ll], "|", lev[-1], sep="")
>  Dataset<-do.call(rbind,
>                   lapply(2:ll,
>                          function(i) {
>                            data.frame(Thr=thr[i-1],
>                                       Cat=Data\$Cat>=lev[i],
>                                       Data[,names(Data)[!(names(Data) %in
> % c("Thr", "Cat"))]])}))
>  Dataset\$Thr<-factor(Dataset\$Thr)
> })
>
> made me able to reproduce my standard (up to numerical precision, with
> which I didn't tinker, and a couple of cosmetic details (names and sign
> of intercepts)), both for central value and variability :
>
>> summary(Glm.polr)
>
> Call:
> glm(formula = Cat ~ 0 + Thr + X, family = binomial(link = logit),
>    data = Dataset)
>
> Deviance Residuals:
>     Min        1Q    Median        3Q       Max
> -2.09822  -0.00001   0.00000   0.00000   1.92014
>
> Coefficients:
>         Estimate Std. Error z value Pr(>|z|)
> ThrC1|C2  -14.858      3.050  -4.872 1.11e-06 ***
> ThrC2|C3  -24.263      4.722  -5.138 2.78e-07 ***
> ThrC3|C4  -34.217      6.600  -5.184 2.17e-07 ***
> ThrC4|C5  -43.329      8.257  -5.247 1.54e-07 ***
> ThrC5|C6  -53.786     10.346  -5.199 2.01e-07 ***
> ThrC6|C7  -63.489     12.260  -5.179 2.24e-07 ***
> ThrC7|C8  -72.542     13.820  -5.249 1.53e-07 ***
> ThrC8|C9  -82.174     15.852  -5.184 2.18e-07 ***
> X           9.551      1.825   5.233 1.67e-07 ***
> ---
> Signif. codes:  0  ***  0.001  **  0.01  *  0.05  .  0.1     1
>
> (Dispersion parameter for binomial family taken to be 1)
>
>    Null deviance: 1109.035  on 800  degrees of freedom
> Residual deviance:   55.737  on 791  degrees of freedom
> AIC: 73.737
>
> Number of Fisher Scoring iterations: 12
>
> One should note that "large" thresholds (in absolute value) have larger
> standard errors, which is probably an artifact, which I don't know how to
> get rid of. The glm claim of "800 degrees of freedom" is ridiculous : I
> have only 100 observations. More on this below...
>
> Now, to extend this to mixed models (using lme4's notation as  model) is
> not absolutely trivial :
>
> The "fixed effect" part of the formula must *not* have intercepts (that's
> what the thresholds become) : how to ensure this ?
>
> In the "random effect" part, the intercepts can be replaced by the
> thresholds (one can think of legitimate reasons for which thresholds
> should depend on a random factor...). Should that be done also for
> implicit thresholds (e. g. (X | Id) means, in a GLM (implicitely)
> "variation of slope of X AND intercept according to Id". Should that
> become "variation of slope AND threshold according to Id" ?).
>
> How should the tinkering with the original data be handled ?
> Systematically copying the original dta can be heavy, but is safe. Can
> the built-in "copy-on-write" mechanisms of R be sufficient ?
>
> What "other" models would be useful in such a function ? Cloglog ?
> "Continuation" ? Others ?
>
> More generally, shouldn't such a function (polrer ?) be part of some
> hypothetical (Platonician ?) lme4 ? Ordered data are important in many
> cases where numerical expressions cannot be trusted, and the ability to
> fit mixed models of these data would be important. But doing this on a
> case by case basis is somewhat error-prone (especially when done by me :).
>
> For now on, I cope with BUGS and Bayesian interpretation. And hereby lies
> a possibly important remark.
>
> My BUGS code (JAGS dialect, inspired by the "Bones" example of Classic
> Bugs vol I) and R call (through rjags) is at end.
>
> The results call for some comments. Both graphical examination and some
> formal inspection show good convergence (the predictions are somewhat
> harder to stabilize...) :
>
>> gelman.diag(Mod\$Mod.coda[,-(1:100)]) ## First 100 columns are
>                                       ## predicted values for the
>                                       ## original data.
> Potential scale reduction factors:
>
>         Point est. 97.5% quantile
> beta.1         1.00           1.00
> deviance       1.00           1.00
> gamma[1]       1.00           1.00
> gamma[2]       1.00           1.00
> gamma[3]       1.00           1.01
> gamma[4]       1.00           1.00
> gamma[5]       1.00           1.00
> gamma[6]       1.00           1.00
> gamma[7]       1.00           1.00
> gamma[8]       1.00           1.00
>
> Multivariate psrf
>
> 1.00
>
> The fitted data and their variability are cosmetically quite different
> (the Bayesian model's gamma is the equivalent of the Threshold/X
> coefficient of the polr/glm frequentist model), but their central value
> quite consistent with previous values :
>
>> summary(Mod\$Mod.coda[,-(1:100)])
>
> Iterations = 11010:21000
> Thinning interval = 10
> Number of chains = 3
> Sample size per chain = 1000
>
> 1. Empirical mean and standard deviation for each variable,
>   plus standard error of the mean:
>
>           Mean      SD Naive SE Time-series SE
> beta.1    8.927 1.68312 0.030729       0.027433
> deviance 64.674 4.26684 0.077902       0.087455
> gamma[1]  1.544 0.08815 0.001609       0.001649
> gamma[2]  2.544 0.10957 0.002000       0.002065
> gamma[3]  3.580 0.09930 0.001813       0.001708
> gamma[4]  4.544 0.08445 0.001542       0.001607
> gamma[5]  5.630 0.11998 0.002191       0.002137
> gamma[6]  6.627 0.12406 0.002265       0.002463
> gamma[7]  7.606 0.11175 0.002040       0.002019
> gamma[8]  8.608 0.18558 0.003388       0.003343
>
> 2. Quantiles for each variable:
>
>           2.5%    25%    50%    75%  97.5%
> beta.1    6.180  7.741  8.799  9.878 12.582
> deviance 58.271 61.458 64.116 67.112 74.836
> gamma[1]  1.363  1.489  1.548  1.604  1.706
> gamma[2]  2.333  2.469  2.541  2.616  2.765
> gamma[3]  3.386  3.512  3.582  3.647  3.779
> gamma[4]  4.385  4.487  4.543  4.600  4.713
> gamma[5]  5.406  5.545  5.630  5.713  5.870
> gamma[6]  6.358  6.550  6.633  6.715  6.851
> gamma[7]  7.392  7.529  7.603  7.682  7.821
> gamma[8]  8.266  8.470  8.608  8.738  8.967
>
> One should note, however, that, while the central value and sampling StD
> of beta are consistent with polr/glm results, the ratios SD/mean of the
> thresholds are much smaller than with polr/glm, and this is reflected in
> the credible intervals. Furthermore, the SDs do not seem to increase as
> much with the (absolute value) of the central values. (Yes, I am aware
> that centering my X values would reduce this phenomenon, but I kept them
> uncentered to illustrate my point).
>
> For example, the 0.95 confidence interval of the C1|C2 threshold is about
> 6.1 wide (about 41% of the mean) whereas the 0.95 credible interval of
> gamma[1] is (1.36 1.70), 0.34 wide, which is about 25% of its mean. For
> the C8|C9 thershold, the IC95/mean ratio is about 37% of the mean, while
> the CrI95/mean of gamma[8] is 8.1%.
>
> Given that the two models are respectively a frequentist and a Bayesian
> of the very same probability model, this is perplexing.
>
> Are there reasons to suspect that the SD reported by glm and/or polr are
> inflated (for example, these model do not seem to account for the re-use
> of the same data for the estimation of the various thresholds) ? Or is
> there a (maybe not so subtle) error in the Bayesian model interpretation
> in BUGS ?
>
> Or am I barking up the wrong tree ?
>
> Any idea would be welcome ...
>
> Sincerely yours,
>
>                                        Emmanuel Charpentier
>
> The artificial data :
> Data <-
> structure(list(X.real = c(3.49764437862872, 6.90659633160501,
> 2.49335390384994, 7.24293843863257, 0.803964727576124, 4.04213878877333,
> 1.06214939381180, 0.554054259416088, 4.08729099177371, 2.52629057633986,
> 3.24127380430347, 1.11891797458651, 0.530158326675324, 9.13663197362062,
> 4.50887509479721, 8.28526961997477, 4.88346597634514, 7.1687905770683,
> 8.30770739436844, 4.10970688126136, 0.926142793485848, 1.60153659527237,
> 7.9980129347903, 7.09517767824342, 3.13154316634710, 3.73947230927693,
> 2.80221854118421, 6.59984440492743, 2.61928305396566, 7.01599863310761,
> 2.24968126253923, 1.58829945203618, 1.90902984485251, 5.12920046748922,
> 9.2118070532461, 9.08208307216233, 4.43550132482185, 8.05729871351512,
> 1.18297174176127, 8.46910981252808, 4.43676303030525, 6.4418454794069,
> 4.53446029962874, 0.699769024165557, 8.46097165332483, 4.53799448354545,
> 5.27431575051651, 7.54335240307173, 2.17309956012568, 5.88721264347949,
> 9.4071993099453, 1.88089249201845, 5.3721556593878, 0.771888320064126,
> 8.1072438324685, 6.31264339382069, 2.00734973388480, 1.55494366180340,
> 7.02181982951623, 2.30599238507652, 0.527109325928194, 3.66541005952039,
> 5.5545317236498, 1.09950173374848, 0.589535171670769, 3.53540683843517,
> 4.45889129379049, 6.80170522713363, 7.37081649369139, 9.4889221652781,
> 7.19282466718394, 6.13643707306052, 1.35088176999453, 3.16405426710045,
> 3.32833977017296, 6.08074276163767, 1.63873091713528, 1.46557732912316,
> 4.91770214161358, 0.938513984685997, 1.48740958479596, 2.01597792723018,
> 4.21291073575364, 5.43887605123115, 4.70221174192061, 1.67668472528311,
> 5.24846045233372, 9.0111473276549, 7.23671006856042, 3.72080675801418,
> 4.39171014756293, 4.81283564404103, 2.41342165250038, 3.82321065255632,
> 7.669782577978, 9.2234752201292, 0.641411897606775, 1.13470612170815,
> 1.02966988829139, 1.24689684186557), Cat = structure(c(3L, 7L,
> 2L, 7L, 1L, 4L, 1L, 1L, 4L, 3L, 3L, 1L, 1L, 9L, 5L, 8L, 5L, 7L,
> 8L, 4L, 1L, 2L, 8L, 7L, 3L, 4L, 3L, 7L, 3L, 7L, 2L, 2L, 2L, 5L,
> 9L, 9L, 4L, 8L, 1L, 8L, 4L, 6L, 5L, 1L, 8L, 5L, 5L, 8L, 2L, 6L,
> 9L, 2L, 5L, 1L, 8L, 6L, 2L, 2L, 7L, 2L, 1L, 4L, 6L, 1L, 1L, 4L,
> 4L, 7L, 7L, 9L, 7L, 6L, 1L, 3L, 3L, 6L, 2L, 1L, 5L, 1L, 1L, 2L,
> 4L, 5L, 5L, 2L, 5L, 9L, 7L, 4L, 4L, 5L, 2L, 4L, 8L, 9L, 1L, 1L,
> 1L, 1L), .Label = c("C1", "C2", "C3", "C4", "C5", "C6", "C7",
> "C8", "C9"), class = c("ordered", "factor")), X = c(3.78512255820666,
> 7.03710649735623, 2.37579547880536, 7.01687069036889, 0.572774665466921,
> 4.53333128338774, 0.92564311992818, 0.682074690286214, 4.17197303371935,
> 2.62951212025962, 3.44249595134830, 1.02688214798057, 0.440959503725193,
> 9.05785834195666, 4.52616407157106, 8.07448237191856, 4.95371782896727,
> 7.35730414103575, 8.07638469674326, 4.26407495806574, 1.06385684713162,
> 1.63333280184271, 7.9246178454581, 7.41541196525404, 3.12459209709868,
> 3.64101167776984, 2.51917629411974, 6.8118629850261, 2.77786400733997,
> 7.36566898280035, 2.03009770691198, 1.61723070290973, 1.96657910701987,
> 5.22600151472831, 9.21386030948403, 9.18902347324618, 4.39493834305473,
> 8.27266371630105, 1.15477068640133, 8.15280513569419, 4.6734050583675,
> 6.51664088906166, 4.40955269905174, 0.54646886424704, 8.03605092676131,
> 4.36779957410513, 5.44860695752003, 7.83429549417553, 1.84916430642593,
> 5.81109382387844, 9.32133297147632, 1.73276033539495, 5.3490390105937,
> 0.695781216499173, 8.05603266764859, 6.67741469458807, 2.03017543225991,
> 1.60754713124494, 6.63481861913185, 2.20004584719421, 0.675994410762504,
> 3.98953568167253, 5.45631429591027, 0.675175153754915, 0.305522085250977,
> 3.58440905189674, 4.58557708778191, 6.99911313460102, 7.21102458968007,
> 9.44650050070282, 7.33311596756764, 5.87485729872331, 1.11824133536435,
> 3.40701638829918, 3.36818213997361, 5.92742193489156, 2.08701427718401,
> 1.73709762230381, 4.71589786549998, 0.980248052739725, 1.58957599464140,
> 2.17701413093404, 4.22999685414555, 5.84947271423299, 4.94396458250251,
> 1.65133703600731, 5.17336780283038, 8.8993318875765, 7.29238547003421,
> 3.62020801752199, 4.28550112286315, 4.95543088867053, 2.61555021108721,
> 3.85711807608966, 7.67091954093986, 9.24609859359625, 0.436967673925767,
> 1.32063426377087, 0.794911931890139, 1.16453130413352)), .Names = c
> ("X.real",
> "Cat", "X"), row.names = c(NA, -100L), class = "data.frame")
>
> The R/BUGS Bayesian model :
>
> system.time(Mod<-local({
>  Mod<-function(){ ## BUGS model as an R function. Easy to manage.
>    ## Thresholds.
>    for (j in 1:ncut) {
>      gamma.raw[j]~dnorm(0, 1.0E-4)
>    }
>    gamma<-sort(gamma.raw)
>    ## p[i,j]=Pr(Cat[i]>j)
>    ## q[i,j]=Pr(Cat[i]=j)
>    ##       =Pr(Cat[i]>j-1)-Pr(Cat[i]>j)
>    ##       =p[i,j-1]-p[i,j]
>    for (i in 1:nobs) {
>      Cat[i]~dcat(q[i,1:(ncut+1)])
>      ## Predictions
>      Cat.new[i]~dcat(q[i,1:(ncut+1)])
>      q[i,1]<-1-p[i,1]
>      for (j in 2:ncut) {
>        q[i,j]<-p[i,j-1]-p[i,j]
>      }
>      q[i,ncut+1]<-p[i,ncut]
>      for (j in 1:ncut) {
>        logit(p[i,j])<-beta.1*(X[i]-gamma[j])
>      }
>    }
>    beta.1~dnorm(0, 1.0E-4)
>  }
>  tmpf<-tempfile()
>  write.model(Mod, tmpf) ## Auxilliary function writing a function body
>  Mod.jags<-jags.model(tmpf,
>                       data=with(Data, {
>                         Y<-outer(Cat, levels(Cat)[-1], ">=")
>                         list(X=X,
>                              Cat=unclass(Cat),
>                              nobs=nrow(Y),
>                              ncut=ncol(Y))}),
>                       ## "Reasonable" (not too overdispersed) initial
>                       ## values are *CRUCIAL* to model initialisation
>                       inits=function() {
>                         list(gamma.raw=sort(runif(8,min=-1,max=1)),
>                              beta.1=runif(1,min=0,max=1))
>                       },
>                       n.chains=3)
>  update(Mod.jags, n.iter=10000)
>  Mod.coda<-coda.samples(Mod.jags,
>                         variable.names=c("gamma",
>                           ## "beta.0",
>                           "beta.1",
>                           ## "lzeta",
>                           ## "pi.sgn",
>                           "deviance",
>                           "Cat.new"),
>                         thin=10,
>                         n.iter=10000)
>  list(Mod.jags=Mod.jags, Mod.coda=Mod.coda)
> }))
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

--
Rune Haubo Bojesen Christensen

PhD Student, M.Sc. Eng.
Phone: (+45) 45 25 33 63
Mobile: (+45) 30 26 45 54

DTU Informatics, Section for Statistics
Technical University of Denmark, Build. 305, Room 122,
DK-2800 Kgs. Lyngby, Denmark

```