# [R-sig-ME] Mixed-model polytomous ordered logistic regression ?

Emmanuel Charpentier emm.charpentier at free.fr
Thu Dec 30 23:27:41 CET 2010

```Dear list,

Following a hint of an exercise in Gelman & Hill's (2007) regression
textbook, I tried to understand  how to build a function generalizing to
mixed models what polr() (MASS) does for fixed models.

I started with a very simple artificial dataset (see at end). Using the
simplest polr() use :

> summary(polr(Cat~X, data=Data))

Re-fitting to get Hessian

Call:
polr(formula = Cat ~ X, data = Data)

Coefficients:
Value Std. Error t value
X 9.557      1.821   5.247

Intercepts:
Value   Std. Error t value
C1|C2 14.8668  3.0447     4.8828
C2|C3 24.2772  4.7119     5.1523
C3|C4 34.2367  6.5861     5.1984
C4|C5 43.3543  8.2390     5.2621
C5|C6 53.8174 10.3240     5.2128
C6|C7 63.5247 12.2352     5.1920
C7|C8 72.5850 13.7899     5.2636
C8|C9 82.2256 15.8144     5.1994

Residual Deviance: 55.7395
AIC: 73.7395
Message d'avis :
glm.fit: fitted probabilities numerically 0 or 1 occurred

the following snippet :

Glm.polr<-local({
ll<-length(lev<-levels(Data\$Cat))
thr<-paste(lev[-ll], "|", lev[-1], sep="")
Dataset<-do.call(rbind,
lapply(2:ll,
function(i) {
data.frame(Thr=thr[i-1],
Cat=Data\$Cat>=lev[i],
Data[,names(Data)[!(names(Data) %in
% c("Thr", "Cat"))]])}))
Dataset\$Thr<-factor(Dataset\$Thr)
glm(Cat~0+Thr+X, data=Dataset, family=binomial(link=logit))
})

made me able to reproduce my standard (up to numerical precision, with
which I didn't tinker, and a couple of cosmetic details (names and sign
of intercepts)), both for central value and variability :

> summary(Glm.polr)

Call:
glm(formula = Cat ~ 0 + Thr + X, family = binomial(link = logit),
data = Dataset)

Deviance Residuals:
Min        1Q    Median        3Q       Max
-2.09822  -0.00001   0.00000   0.00000   1.92014

Coefficients:
Estimate Std. Error z value Pr(>|z|)
ThrC1|C2  -14.858      3.050  -4.872 1.11e-06 ***
ThrC2|C3  -24.263      4.722  -5.138 2.78e-07 ***
ThrC3|C4  -34.217      6.600  -5.184 2.17e-07 ***
ThrC4|C5  -43.329      8.257  -5.247 1.54e-07 ***
ThrC5|C6  -53.786     10.346  -5.199 2.01e-07 ***
ThrC6|C7  -63.489     12.260  -5.179 2.24e-07 ***
ThrC7|C8  -72.542     13.820  -5.249 1.53e-07 ***
ThrC8|C9  -82.174     15.852  -5.184 2.18e-07 ***
X           9.551      1.825   5.233 1.67e-07 ***
---
Signif. codes:  0  ***  0.001  **  0.01  *  0.05  .  0.1     1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1109.035  on 800  degrees of freedom
Residual deviance:   55.737  on 791  degrees of freedom
AIC: 73.737

Number of Fisher Scoring iterations: 12

One should note that "large" thresholds (in absolute value) have larger
standard errors, which is probably an artifact, which I don't know how to
get rid of. The glm claim of "800 degrees of freedom" is ridiculous : I
have only 100 observations. More on this below...

Now, to extend this to mixed models (using lme4's notation as  model) is
not absolutely trivial :

The "fixed effect" part of the formula must *not* have intercepts (that's
what the thresholds become) : how to ensure this ?

In the "random effect" part, the intercepts can be replaced by the
thresholds (one can think of legitimate reasons for which thresholds
should depend on a random factor...). Should that be done also for
implicit thresholds (e. g. (X | Id) means, in a GLM (implicitely)
"variation of slope of X AND intercept according to Id". Should that
become "variation of slope AND threshold according to Id" ?).

How should the tinkering with the original data be handled ?
Systematically copying the original dta can be heavy, but is safe. Can
the built-in "copy-on-write" mechanisms of R be sufficient ?

What "other" models would be useful in such a function ? Cloglog ?
"Continuation" ? Others ?

More generally, shouldn't such a function (polrer ?) be part of some
hypothetical (Platonician ?) lme4 ? Ordered data are important in many
cases where numerical expressions cannot be trusted, and the ability to
fit mixed models of these data would be important. But doing this on a
case by case basis is somewhat error-prone (especially when done by me :).

For now on, I cope with BUGS and Bayesian interpretation. And hereby lies
a possibly important remark.

My BUGS code (JAGS dialect, inspired by the "Bones" example of Classic
Bugs vol I) and R call (through rjags) is at end.

The results call for some comments. Both graphical examination and some
formal inspection show good convergence (the predictions are somewhat
harder to stabilize...) :

> gelman.diag(Mod\$Mod.coda[,-(1:100)]) ## First 100 columns are
## predicted values for the
## original data.
Potential scale reduction factors:

Point est. 97.5% quantile
beta.1         1.00           1.00
deviance       1.00           1.00
gamma[1]       1.00           1.00
gamma[2]       1.00           1.00
gamma[3]       1.00           1.01
gamma[4]       1.00           1.00
gamma[5]       1.00           1.00
gamma[6]       1.00           1.00
gamma[7]       1.00           1.00
gamma[8]       1.00           1.00

Multivariate psrf

1.00

The fitted data and their variability are cosmetically quite different
(the Bayesian model's gamma is the equivalent of the Threshold/X
coefficient of the polr/glm frequentist model), but their central value
quite consistent with previous values :

> summary(Mod\$Mod.coda[,-(1:100)])

Iterations = 11010:21000
Thinning interval = 10
Number of chains = 3
Sample size per chain = 1000

1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:

Mean      SD Naive SE Time-series SE
beta.1    8.927 1.68312 0.030729       0.027433
deviance 64.674 4.26684 0.077902       0.087455
gamma[1]  1.544 0.08815 0.001609       0.001649
gamma[2]  2.544 0.10957 0.002000       0.002065
gamma[3]  3.580 0.09930 0.001813       0.001708
gamma[4]  4.544 0.08445 0.001542       0.001607
gamma[5]  5.630 0.11998 0.002191       0.002137
gamma[6]  6.627 0.12406 0.002265       0.002463
gamma[7]  7.606 0.11175 0.002040       0.002019
gamma[8]  8.608 0.18558 0.003388       0.003343

2. Quantiles for each variable:

2.5%    25%    50%    75%  97.5%
beta.1    6.180  7.741  8.799  9.878 12.582
deviance 58.271 61.458 64.116 67.112 74.836
gamma[1]  1.363  1.489  1.548  1.604  1.706
gamma[2]  2.333  2.469  2.541  2.616  2.765
gamma[3]  3.386  3.512  3.582  3.647  3.779
gamma[4]  4.385  4.487  4.543  4.600  4.713
gamma[5]  5.406  5.545  5.630  5.713  5.870
gamma[6]  6.358  6.550  6.633  6.715  6.851
gamma[7]  7.392  7.529  7.603  7.682  7.821
gamma[8]  8.266  8.470  8.608  8.738  8.967

One should note, however, that, while the central value and sampling StD
of beta are consistent with polr/glm results, the ratios SD/mean of the
thresholds are much smaller than with polr/glm, and this is reflected in
the credible intervals. Furthermore, the SDs do not seem to increase as
much with the (absolute value) of the central values. (Yes, I am aware
that centering my X values would reduce this phenomenon, but I kept them
uncentered to illustrate my point).

For example, the 0.95 confidence interval of the C1|C2 threshold is about
6.1 wide (about 41% of the mean) whereas the 0.95 credible interval of
gamma[1] is (1.36 1.70), 0.34 wide, which is about 25% of its mean. For
the C8|C9 thershold, the IC95/mean ratio is about 37% of the mean, while
the CrI95/mean of gamma[8] is 8.1%.

Given that the two models are respectively a frequentist and a Bayesian
of the very same probability model, this is perplexing.

Are there reasons to suspect that the SD reported by glm and/or polr are
inflated (for example, these model do not seem to account for the re-use
of the same data for the estimation of the various thresholds) ? Or is
there a (maybe not so subtle) error in the Bayesian model interpretation
in BUGS ?

Or am I barking up the wrong tree ?

Any idea would be welcome ...

Sincerely yours,

Emmanuel Charpentier

The artificial data :
Data <-
structure(list(X.real = c(3.49764437862872, 6.90659633160501,
2.49335390384994, 7.24293843863257, 0.803964727576124, 4.04213878877333,
1.06214939381180, 0.554054259416088, 4.08729099177371, 2.52629057633986,
3.24127380430347, 1.11891797458651, 0.530158326675324, 9.13663197362062,
4.50887509479721, 8.28526961997477, 4.88346597634514, 7.1687905770683,
8.30770739436844, 4.10970688126136, 0.926142793485848, 1.60153659527237,
7.9980129347903, 7.09517767824342, 3.13154316634710, 3.73947230927693,
2.80221854118421, 6.59984440492743, 2.61928305396566, 7.01599863310761,
2.24968126253923, 1.58829945203618, 1.90902984485251, 5.12920046748922,
9.2118070532461, 9.08208307216233, 4.43550132482185, 8.05729871351512,
1.18297174176127, 8.46910981252808, 4.43676303030525, 6.4418454794069,
4.53446029962874, 0.699769024165557, 8.46097165332483, 4.53799448354545,
5.27431575051651, 7.54335240307173, 2.17309956012568, 5.88721264347949,
9.4071993099453, 1.88089249201845, 5.3721556593878, 0.771888320064126,
8.1072438324685, 6.31264339382069, 2.00734973388480, 1.55494366180340,
7.02181982951623, 2.30599238507652, 0.527109325928194, 3.66541005952039,
5.5545317236498, 1.09950173374848, 0.589535171670769, 3.53540683843517,
4.45889129379049, 6.80170522713363, 7.37081649369139, 9.4889221652781,
7.19282466718394, 6.13643707306052, 1.35088176999453, 3.16405426710045,
3.32833977017296, 6.08074276163767, 1.63873091713528, 1.46557732912316,
4.91770214161358, 0.938513984685997, 1.48740958479596, 2.01597792723018,
4.21291073575364, 5.43887605123115, 4.70221174192061, 1.67668472528311,
5.24846045233372, 9.0111473276549, 7.23671006856042, 3.72080675801418,
4.39171014756293, 4.81283564404103, 2.41342165250038, 3.82321065255632,
7.669782577978, 9.2234752201292, 0.641411897606775, 1.13470612170815,
1.02966988829139, 1.24689684186557), Cat = structure(c(3L, 7L,
2L, 7L, 1L, 4L, 1L, 1L, 4L, 3L, 3L, 1L, 1L, 9L, 5L, 8L, 5L, 7L,
8L, 4L, 1L, 2L, 8L, 7L, 3L, 4L, 3L, 7L, 3L, 7L, 2L, 2L, 2L, 5L,
9L, 9L, 4L, 8L, 1L, 8L, 4L, 6L, 5L, 1L, 8L, 5L, 5L, 8L, 2L, 6L,
9L, 2L, 5L, 1L, 8L, 6L, 2L, 2L, 7L, 2L, 1L, 4L, 6L, 1L, 1L, 4L,
4L, 7L, 7L, 9L, 7L, 6L, 1L, 3L, 3L, 6L, 2L, 1L, 5L, 1L, 1L, 2L,
4L, 5L, 5L, 2L, 5L, 9L, 7L, 4L, 4L, 5L, 2L, 4L, 8L, 9L, 1L, 1L,
1L, 1L), .Label = c("C1", "C2", "C3", "C4", "C5", "C6", "C7",
"C8", "C9"), class = c("ordered", "factor")), X = c(3.78512255820666,
7.03710649735623, 2.37579547880536, 7.01687069036889, 0.572774665466921,
4.53333128338774, 0.92564311992818, 0.682074690286214, 4.17197303371935,
2.62951212025962, 3.44249595134830, 1.02688214798057, 0.440959503725193,
9.05785834195666, 4.52616407157106, 8.07448237191856, 4.95371782896727,
7.35730414103575, 8.07638469674326, 4.26407495806574, 1.06385684713162,
1.63333280184271, 7.9246178454581, 7.41541196525404, 3.12459209709868,
3.64101167776984, 2.51917629411974, 6.8118629850261, 2.77786400733997,
7.36566898280035, 2.03009770691198, 1.61723070290973, 1.96657910701987,
5.22600151472831, 9.21386030948403, 9.18902347324618, 4.39493834305473,
8.27266371630105, 1.15477068640133, 8.15280513569419, 4.6734050583675,
6.51664088906166, 4.40955269905174, 0.54646886424704, 8.03605092676131,
4.36779957410513, 5.44860695752003, 7.83429549417553, 1.84916430642593,
5.81109382387844, 9.32133297147632, 1.73276033539495, 5.3490390105937,
0.695781216499173, 8.05603266764859, 6.67741469458807, 2.03017543225991,
1.60754713124494, 6.63481861913185, 2.20004584719421, 0.675994410762504,
3.98953568167253, 5.45631429591027, 0.675175153754915, 0.305522085250977,
3.58440905189674, 4.58557708778191, 6.99911313460102, 7.21102458968007,
9.44650050070282, 7.33311596756764, 5.87485729872331, 1.11824133536435,
3.40701638829918, 3.36818213997361, 5.92742193489156, 2.08701427718401,
1.73709762230381, 4.71589786549998, 0.980248052739725, 1.58957599464140,
2.17701413093404, 4.22999685414555, 5.84947271423299, 4.94396458250251,
1.65133703600731, 5.17336780283038, 8.8993318875765, 7.29238547003421,
3.62020801752199, 4.28550112286315, 4.95543088867053, 2.61555021108721,
3.85711807608966, 7.67091954093986, 9.24609859359625, 0.436967673925767,
1.32063426377087, 0.794911931890139, 1.16453130413352)), .Names = c
("X.real",
"Cat", "X"), row.names = c(NA, -100L), class = "data.frame")

The R/BUGS Bayesian model :

system.time(Mod<-local({
Mod<-function(){ ## BUGS model as an R function. Easy to manage.
## Thresholds.
for (j in 1:ncut) {
gamma.raw[j]~dnorm(0, 1.0E-4)
}
gamma<-sort(gamma.raw)
## p[i,j]=Pr(Cat[i]>j)
## q[i,j]=Pr(Cat[i]=j)
##       =Pr(Cat[i]>j-1)-Pr(Cat[i]>j)
##       =p[i,j-1]-p[i,j]
for (i in 1:nobs) {
Cat[i]~dcat(q[i,1:(ncut+1)])
## Predictions
Cat.new[i]~dcat(q[i,1:(ncut+1)])
q[i,1]<-1-p[i,1]
for (j in 2:ncut) {
q[i,j]<-p[i,j-1]-p[i,j]
}
q[i,ncut+1]<-p[i,ncut]
for (j in 1:ncut) {
logit(p[i,j])<-beta.1*(X[i]-gamma[j])
}
}
beta.1~dnorm(0, 1.0E-4)
}
tmpf<-tempfile()
write.model(Mod, tmpf) ## Auxilliary function writing a function body
Mod.jags<-jags.model(tmpf,
data=with(Data, {
Y<-outer(Cat, levels(Cat)[-1], ">=")
list(X=X,
Cat=unclass(Cat),
nobs=nrow(Y),
ncut=ncol(Y))}),
## "Reasonable" (not too overdispersed) initial
## values are *CRUCIAL* to model initialisation
inits=function() {
list(gamma.raw=sort(runif(8,min=-1,max=1)),
beta.1=runif(1,min=0,max=1))
},
n.chains=3)
update(Mod.jags, n.iter=10000)
unlink(tmpf)
Mod.coda<-coda.samples(Mod.jags,
variable.names=c("gamma",
## "beta.0",
"beta.1",
## "gamma.adj",
## "lzeta",
## "pi.sgn",
"deviance",
"Cat.new"),
thin=10,
n.iter=10000)
list(Mod.jags=Mod.jags, Mod.coda=Mod.coda)
}))

```

More information about the R-sig-mixed-models mailing list