# [R-sig-ME] Mixed-model polytomous ordered logistic regression ?

Emmanuel Charpentier emm.charpentier at free.fr
Thu Dec 30 23:27:41 CET 2010

```Dear list,

Following a hint of an exercise in Gelman & Hill's (2007) regression
textbook, I tried to understand  how to build a function generalizing to
mixed models what polr() (MASS) does for fixed models.

I started with a very simple artificial dataset (see at end). Using the
simplest polr() use :

> summary(polr(Cat~X, data=Data))

Re-fitting to get Hessian

Call:
polr(formula = Cat ~ X, data = Data)

Coefficients:
Value Std. Error t value
X 9.557      1.821   5.247

Intercepts:
Value   Std. Error t value
C1|C2 14.8668  3.0447     4.8828
C2|C3 24.2772  4.7119     5.1523
C3|C4 34.2367  6.5861     5.1984
C4|C5 43.3543  8.2390     5.2621
C5|C6 53.8174 10.3240     5.2128
C6|C7 63.5247 12.2352     5.1920
C7|C8 72.5850 13.7899     5.2636
C8|C9 82.2256 15.8144     5.1994

Residual Deviance: 55.7395
AIC: 73.7395
Message d'avis :
glm.fit: fitted probabilities numerically 0 or 1 occurred

the following snippet :

Glm.polr<-local({
ll<-length(lev<-levels(Data\$Cat))
thr<-paste(lev[-ll], "|", lev[-1], sep="")
Dataset<-do.call(rbind,
lapply(2:ll,
function(i) {
data.frame(Thr=thr[i-1],
Cat=Data\$Cat>=lev[i],
Data[,names(Data)[!(names(Data) %in
% c("Thr", "Cat"))]])}))
Dataset\$Thr<-factor(Dataset\$Thr)
})

made me able to reproduce my standard (up to numerical precision, with
which I didn't tinker, and a couple of cosmetic details (names and sign
of intercepts)), both for central value and variability :

> summary(Glm.polr)

Call:
glm(formula = Cat ~ 0 + Thr + X, family = binomial(link = logit),
data = Dataset)

Deviance Residuals:
Min        1Q    Median        3Q       Max
-2.09822  -0.00001   0.00000   0.00000   1.92014

Coefficients:
Estimate Std. Error z value Pr(>|z|)
ThrC1|C2  -14.858      3.050  -4.872 1.11e-06 ***
ThrC2|C3  -24.263      4.722  -5.138 2.78e-07 ***
ThrC3|C4  -34.217      6.600  -5.184 2.17e-07 ***
ThrC4|C5  -43.329      8.257  -5.247 1.54e-07 ***
ThrC5|C6  -53.786     10.346  -5.199 2.01e-07 ***
ThrC6|C7  -63.489     12.260  -5.179 2.24e-07 ***
ThrC7|C8  -72.542     13.820  -5.249 1.53e-07 ***
ThrC8|C9  -82.174     15.852  -5.184 2.18e-07 ***
X           9.551      1.825   5.233 1.67e-07 ***
---
Signif. codes:  0  ***  0.001  **  0.01  *  0.05  .  0.1     1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1109.035  on 800  degrees of freedom
Residual deviance:   55.737  on 791  degrees of freedom
AIC: 73.737

Number of Fisher Scoring iterations: 12

One should note that "large" thresholds (in absolute value) have larger
standard errors, which is probably an artifact, which I don't know how to
get rid of. The glm claim of "800 degrees of freedom" is ridiculous : I
have only 100 observations. More on this below...

Now, to extend this to mixed models (using lme4's notation as  model) is
not absolutely trivial :

The "fixed effect" part of the formula must *not* have intercepts (that's
what the thresholds become) : how to ensure this ?

In the "random effect" part, the intercepts can be replaced by the
thresholds (one can think of legitimate reasons for which thresholds
should depend on a random factor...). Should that be done also for
implicit thresholds (e. g. (X | Id) means, in a GLM (implicitely)
"variation of slope of X AND intercept according to Id". Should that
become "variation of slope AND threshold according to Id" ?).

How should the tinkering with the original data be handled ?
Systematically copying the original dta can be heavy, but is safe. Can
the built-in "copy-on-write" mechanisms of R be sufficient ?

What "other" models would be useful in such a function ? Cloglog ?
"Continuation" ? Others ?

More generally, shouldn't such a function (polrer ?) be part of some
hypothetical (Platonician ?) lme4 ? Ordered data are important in many
cases where numerical expressions cannot be trusted, and the ability to
fit mixed models of these data would be important. But doing this on a
case by case basis is somewhat error-prone (especially when done by me :).

For now on, I cope with BUGS and Bayesian interpretation. And hereby lies
a possibly important remark.

My BUGS code (JAGS dialect, inspired by the "Bones" example of Classic
Bugs vol I) and R call (through rjags) is at end.

The results call for some comments. Both graphical examination and some
formal inspection show good convergence (the predictions are somewhat
harder to stabilize...) :

> gelman.diag(Mod\$Mod.coda[,-(1:100)]) ## First 100 columns are
## predicted values for the
## original data.
Potential scale reduction factors:

Point est. 97.5% quantile
beta.1         1.00           1.00
deviance       1.00           1.00
gamma       1.00           1.00
gamma       1.00           1.00
gamma       1.00           1.01
gamma       1.00           1.00
gamma       1.00           1.00
gamma       1.00           1.00
gamma       1.00           1.00
gamma       1.00           1.00

Multivariate psrf

1.00

The fitted data and their variability are cosmetically quite different
(the Bayesian model's gamma is the equivalent of the Threshold/X
coefficient of the polr/glm frequentist model), but their central value
quite consistent with previous values :

> summary(Mod\$Mod.coda[,-(1:100)])

Iterations = 11010:21000
Thinning interval = 10
Number of chains = 3
Sample size per chain = 1000

1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:

Mean      SD Naive SE Time-series SE
beta.1    8.927 1.68312 0.030729       0.027433
deviance 64.674 4.26684 0.077902       0.087455
gamma  1.544 0.08815 0.001609       0.001649
gamma  2.544 0.10957 0.002000       0.002065
gamma  3.580 0.09930 0.001813       0.001708
gamma  4.544 0.08445 0.001542       0.001607
gamma  5.630 0.11998 0.002191       0.002137
gamma  6.627 0.12406 0.002265       0.002463
gamma  7.606 0.11175 0.002040       0.002019
gamma  8.608 0.18558 0.003388       0.003343

2. Quantiles for each variable:

2.5%    25%    50%    75%  97.5%
beta.1    6.180  7.741  8.799  9.878 12.582
deviance 58.271 61.458 64.116 67.112 74.836
gamma  1.363  1.489  1.548  1.604  1.706
gamma  2.333  2.469  2.541  2.616  2.765
gamma  3.386  3.512  3.582  3.647  3.779
gamma  4.385  4.487  4.543  4.600  4.713
gamma  5.406  5.545  5.630  5.713  5.870
gamma  6.358  6.550  6.633  6.715  6.851
gamma  7.392  7.529  7.603  7.682  7.821
gamma  8.266  8.470  8.608  8.738  8.967

One should note, however, that, while the central value and sampling StD
of beta are consistent with polr/glm results, the ratios SD/mean of the
thresholds are much smaller than with polr/glm, and this is reflected in
the credible intervals. Furthermore, the SDs do not seem to increase as
much with the (absolute value) of the central values. (Yes, I am aware
that centering my X values would reduce this phenomenon, but I kept them
uncentered to illustrate my point).

For example, the 0.95 confidence interval of the C1|C2 threshold is about
6.1 wide (about 41% of the mean) whereas the 0.95 credible interval of
gamma is (1.36 1.70), 0.34 wide, which is about 25% of its mean. For
the C8|C9 thershold, the IC95/mean ratio is about 37% of the mean, while
the CrI95/mean of gamma is 8.1%.

Given that the two models are respectively a frequentist and a Bayesian
of the very same probability model, this is perplexing.

Are there reasons to suspect that the SD reported by glm and/or polr are
inflated (for example, these model do not seem to account for the re-use
of the same data for the estimation of the various thresholds) ? Or is
there a (maybe not so subtle) error in the Bayesian model interpretation
in BUGS ?

Or am I barking up the wrong tree ?

Any idea would be welcome ...

Sincerely yours,

Emmanuel Charpentier

The artificial data :
Data <-
structure(list(X.real = c(3.49764437862872, 6.90659633160501,
2.49335390384994, 7.24293843863257, 0.803964727576124, 4.04213878877333,
1.06214939381180, 0.554054259416088, 4.08729099177371, 2.52629057633986,
3.24127380430347, 1.11891797458651, 0.530158326675324, 9.13663197362062,
4.50887509479721, 8.28526961997477, 4.88346597634514, 7.1687905770683,
8.30770739436844, 4.10970688126136, 0.926142793485848, 1.60153659527237,
7.9980129347903, 7.09517767824342, 3.13154316634710, 3.73947230927693,
2.80221854118421, 6.59984440492743, 2.61928305396566, 7.01599863310761,
2.24968126253923, 1.58829945203618, 1.90902984485251, 5.12920046748922,
9.2118070532461, 9.08208307216233, 4.43550132482185, 8.05729871351512,
1.18297174176127, 8.46910981252808, 4.43676303030525, 6.4418454794069,
4.53446029962874, 0.699769024165557, 8.46097165332483, 4.53799448354545,
5.27431575051651, 7.54335240307173, 2.17309956012568, 5.88721264347949,
9.4071993099453, 1.88089249201845, 5.3721556593878, 0.771888320064126,
8.1072438324685, 6.31264339382069, 2.00734973388480, 1.55494366180340,
7.02181982951623, 2.30599238507652, 0.527109325928194, 3.66541005952039,
5.5545317236498, 1.09950173374848, 0.589535171670769, 3.53540683843517,
4.45889129379049, 6.80170522713363, 7.37081649369139, 9.4889221652781,
7.19282466718394, 6.13643707306052, 1.35088176999453, 3.16405426710045,
3.32833977017296, 6.08074276163767, 1.63873091713528, 1.46557732912316,
4.91770214161358, 0.938513984685997, 1.48740958479596, 2.01597792723018,
4.21291073575364, 5.43887605123115, 4.70221174192061, 1.67668472528311,
5.24846045233372, 9.0111473276549, 7.23671006856042, 3.72080675801418,
4.39171014756293, 4.81283564404103, 2.41342165250038, 3.82321065255632,
7.669782577978, 9.2234752201292, 0.641411897606775, 1.13470612170815,
1.02966988829139, 1.24689684186557), Cat = structure(c(3L, 7L,
2L, 7L, 1L, 4L, 1L, 1L, 4L, 3L, 3L, 1L, 1L, 9L, 5L, 8L, 5L, 7L,
8L, 4L, 1L, 2L, 8L, 7L, 3L, 4L, 3L, 7L, 3L, 7L, 2L, 2L, 2L, 5L,
9L, 9L, 4L, 8L, 1L, 8L, 4L, 6L, 5L, 1L, 8L, 5L, 5L, 8L, 2L, 6L,
9L, 2L, 5L, 1L, 8L, 6L, 2L, 2L, 7L, 2L, 1L, 4L, 6L, 1L, 1L, 4L,
4L, 7L, 7L, 9L, 7L, 6L, 1L, 3L, 3L, 6L, 2L, 1L, 5L, 1L, 1L, 2L,
4L, 5L, 5L, 2L, 5L, 9L, 7L, 4L, 4L, 5L, 2L, 4L, 8L, 9L, 1L, 1L,
1L, 1L), .Label = c("C1", "C2", "C3", "C4", "C5", "C6", "C7",
"C8", "C9"), class = c("ordered", "factor")), X = c(3.78512255820666,
7.03710649735623, 2.37579547880536, 7.01687069036889, 0.572774665466921,
4.53333128338774, 0.92564311992818, 0.682074690286214, 4.17197303371935,
2.62951212025962, 3.44249595134830, 1.02688214798057, 0.440959503725193,
9.05785834195666, 4.52616407157106, 8.07448237191856, 4.95371782896727,
7.35730414103575, 8.07638469674326, 4.26407495806574, 1.06385684713162,
1.63333280184271, 7.9246178454581, 7.41541196525404, 3.12459209709868,
3.64101167776984, 2.51917629411974, 6.8118629850261, 2.77786400733997,
7.36566898280035, 2.03009770691198, 1.61723070290973, 1.96657910701987,
5.22600151472831, 9.21386030948403, 9.18902347324618, 4.39493834305473,
8.27266371630105, 1.15477068640133, 8.15280513569419, 4.6734050583675,
6.51664088906166, 4.40955269905174, 0.54646886424704, 8.03605092676131,
4.36779957410513, 5.44860695752003, 7.83429549417553, 1.84916430642593,
5.81109382387844, 9.32133297147632, 1.73276033539495, 5.3490390105937,
0.695781216499173, 8.05603266764859, 6.67741469458807, 2.03017543225991,
1.60754713124494, 6.63481861913185, 2.20004584719421, 0.675994410762504,
3.98953568167253, 5.45631429591027, 0.675175153754915, 0.305522085250977,
3.58440905189674, 4.58557708778191, 6.99911313460102, 7.21102458968007,
9.44650050070282, 7.33311596756764, 5.87485729872331, 1.11824133536435,
3.40701638829918, 3.36818213997361, 5.92742193489156, 2.08701427718401,
1.73709762230381, 4.71589786549998, 0.980248052739725, 1.58957599464140,
2.17701413093404, 4.22999685414555, 5.84947271423299, 4.94396458250251,
1.65133703600731, 5.17336780283038, 8.8993318875765, 7.29238547003421,
3.62020801752199, 4.28550112286315, 4.95543088867053, 2.61555021108721,
3.85711807608966, 7.67091954093986, 9.24609859359625, 0.436967673925767,
1.32063426377087, 0.794911931890139, 1.16453130413352)), .Names = c
("X.real",
"Cat", "X"), row.names = c(NA, -100L), class = "data.frame")

The R/BUGS Bayesian model :

system.time(Mod<-local({
Mod<-function(){ ## BUGS model as an R function. Easy to manage.
## Thresholds.
for (j in 1:ncut) {
gamma.raw[j]~dnorm(0, 1.0E-4)
}
gamma<-sort(gamma.raw)
## p[i,j]=Pr(Cat[i]>j)
## q[i,j]=Pr(Cat[i]=j)
##       =Pr(Cat[i]>j-1)-Pr(Cat[i]>j)
##       =p[i,j-1]-p[i,j]
for (i in 1:nobs) {
Cat[i]~dcat(q[i,1:(ncut+1)])
## Predictions
Cat.new[i]~dcat(q[i,1:(ncut+1)])
q[i,1]<-1-p[i,1]
for (j in 2:ncut) {
q[i,j]<-p[i,j-1]-p[i,j]
}
q[i,ncut+1]<-p[i,ncut]
for (j in 1:ncut) {
logit(p[i,j])<-beta.1*(X[i]-gamma[j])
}
}
beta.1~dnorm(0, 1.0E-4)
}
tmpf<-tempfile()
write.model(Mod, tmpf) ## Auxilliary function writing a function body
Mod.jags<-jags.model(tmpf,
data=with(Data, {
Y<-outer(Cat, levels(Cat)[-1], ">=")
list(X=X,
Cat=unclass(Cat),
nobs=nrow(Y),
ncut=ncol(Y))}),
## "Reasonable" (not too overdispersed) initial
## values are *CRUCIAL* to model initialisation
inits=function() {
list(gamma.raw=sort(runif(8,min=-1,max=1)),
beta.1=runif(1,min=0,max=1))
},
n.chains=3)
update(Mod.jags, n.iter=10000)
Mod.coda<-coda.samples(Mod.jags,
variable.names=c("gamma",
## "beta.0",
"beta.1",
## "lzeta",
## "pi.sgn",
"deviance",
"Cat.new"),
thin=10,
n.iter=10000)
list(Mod.jags=Mod.jags, Mod.coda=Mod.coda)
}))

```