[R-sig-ME] Mixed-model polytomous ordered logistic regression ?
Emmanuel Charpentier
emm.charpentier at free.fr
Thu Dec 30 23:27:41 CET 2010
Dear list,
Following a hint of an exercise in Gelman & Hill's (2007) regression
textbook, I tried to understand how to build a function generalizing to
mixed models what polr() (MASS) does for fixed models.
I started with a very simple artificial dataset (see at end). Using the
simplest polr() use :
> summary(polr(Cat~X, data=Data))
Re-fitting to get Hessian
Call:
polr(formula = Cat ~ X, data = Data)
Coefficients:
Value Std. Error t value
X 9.557 1.821 5.247
Intercepts:
Value Std. Error t value
C1|C2 14.8668 3.0447 4.8828
C2|C3 24.2772 4.7119 5.1523
C3|C4 34.2367 6.5861 5.1984
C4|C5 43.3543 8.2390 5.2621
C5|C6 53.8174 10.3240 5.2128
C6|C7 63.5247 12.2352 5.1920
C7|C8 72.5850 13.7899 5.2636
C8|C9 82.2256 15.8144 5.1994
Residual Deviance: 55.7395
AIC: 73.7395
Message d'avis :
glm.fit: fitted probabilities numerically 0 or 1 occurred
the following snippet :
Glm.polr<-local({
ll<-length(lev<-levels(Data$Cat))
thr<-paste(lev[-ll], "|", lev[-1], sep="")
Dataset<-do.call(rbind,
lapply(2:ll,
function(i) {
data.frame(Thr=thr[i-1],
Cat=Data$Cat>=lev[i],
Data[,names(Data)[!(names(Data) %in
% c("Thr", "Cat"))]])}))
Dataset$Thr<-factor(Dataset$Thr)
glm(Cat~0+Thr+X, data=Dataset, family=binomial(link=logit))
})
made me able to reproduce my standard (up to numerical precision, with
which I didn't tinker, and a couple of cosmetic details (names and sign
of intercepts)), both for central value and variability :
> summary(Glm.polr)
Call:
glm(formula = Cat ~ 0 + Thr + X, family = binomial(link = logit),
data = Dataset)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.09822 -0.00001 0.00000 0.00000 1.92014
Coefficients:
Estimate Std. Error z value Pr(>|z|)
ThrC1|C2 -14.858 3.050 -4.872 1.11e-06 ***
ThrC2|C3 -24.263 4.722 -5.138 2.78e-07 ***
ThrC3|C4 -34.217 6.600 -5.184 2.17e-07 ***
ThrC4|C5 -43.329 8.257 -5.247 1.54e-07 ***
ThrC5|C6 -53.786 10.346 -5.199 2.01e-07 ***
ThrC6|C7 -63.489 12.260 -5.179 2.24e-07 ***
ThrC7|C8 -72.542 13.820 -5.249 1.53e-07 ***
ThrC8|C9 -82.174 15.852 -5.184 2.18e-07 ***
X 9.551 1.825 5.233 1.67e-07 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1109.035 on 800 degrees of freedom
Residual deviance: 55.737 on 791 degrees of freedom
AIC: 73.737
Number of Fisher Scoring iterations: 12
One should note that "large" thresholds (in absolute value) have larger
standard errors, which is probably an artifact, which I don't know how to
get rid of. The glm claim of "800 degrees of freedom" is ridiculous : I
have only 100 observations. More on this below...
Now, to extend this to mixed models (using lme4's notation as model) is
not absolutely trivial :
The "fixed effect" part of the formula must *not* have intercepts (that's
what the thresholds become) : how to ensure this ?
In the "random effect" part, the intercepts can be replaced by the
thresholds (one can think of legitimate reasons for which thresholds
should depend on a random factor...). Should that be done also for
implicit thresholds (e. g. (X | Id) means, in a GLM (implicitely)
"variation of slope of X AND intercept according to Id". Should that
become "variation of slope AND threshold according to Id" ?).
How should the tinkering with the original data be handled ?
Systematically copying the original dta can be heavy, but is safe. Can
the built-in "copy-on-write" mechanisms of R be sufficient ?
What "other" models would be useful in such a function ? Cloglog ?
"Continuation" ? Others ?
More generally, shouldn't such a function (polrer ?) be part of some
hypothetical (Platonician ?) lme4 ? Ordered data are important in many
cases where numerical expressions cannot be trusted, and the ability to
fit mixed models of these data would be important. But doing this on a
case by case basis is somewhat error-prone (especially when done by me :).
For now on, I cope with BUGS and Bayesian interpretation. And hereby lies
a possibly important remark.
My BUGS code (JAGS dialect, inspired by the "Bones" example of Classic
Bugs vol I) and R call (through rjags) is at end.
The results call for some comments. Both graphical examination and some
formal inspection show good convergence (the predictions are somewhat
harder to stabilize...) :
> gelman.diag(Mod$Mod.coda[,-(1:100)]) ## First 100 columns are
## predicted values for the
## original data.
Potential scale reduction factors:
Point est. 97.5% quantile
beta.1 1.00 1.00
deviance 1.00 1.00
gamma[1] 1.00 1.00
gamma[2] 1.00 1.00
gamma[3] 1.00 1.01
gamma[4] 1.00 1.00
gamma[5] 1.00 1.00
gamma[6] 1.00 1.00
gamma[7] 1.00 1.00
gamma[8] 1.00 1.00
Multivariate psrf
1.00
The fitted data and their variability are cosmetically quite different
(the Bayesian model's gamma is the equivalent of the Threshold/X
coefficient of the polr/glm frequentist model), but their central value
quite consistent with previous values :
> summary(Mod$Mod.coda[,-(1:100)])
Iterations = 11010:21000
Thinning interval = 10
Number of chains = 3
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
beta.1 8.927 1.68312 0.030729 0.027433
deviance 64.674 4.26684 0.077902 0.087455
gamma[1] 1.544 0.08815 0.001609 0.001649
gamma[2] 2.544 0.10957 0.002000 0.002065
gamma[3] 3.580 0.09930 0.001813 0.001708
gamma[4] 4.544 0.08445 0.001542 0.001607
gamma[5] 5.630 0.11998 0.002191 0.002137
gamma[6] 6.627 0.12406 0.002265 0.002463
gamma[7] 7.606 0.11175 0.002040 0.002019
gamma[8] 8.608 0.18558 0.003388 0.003343
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
beta.1 6.180 7.741 8.799 9.878 12.582
deviance 58.271 61.458 64.116 67.112 74.836
gamma[1] 1.363 1.489 1.548 1.604 1.706
gamma[2] 2.333 2.469 2.541 2.616 2.765
gamma[3] 3.386 3.512 3.582 3.647 3.779
gamma[4] 4.385 4.487 4.543 4.600 4.713
gamma[5] 5.406 5.545 5.630 5.713 5.870
gamma[6] 6.358 6.550 6.633 6.715 6.851
gamma[7] 7.392 7.529 7.603 7.682 7.821
gamma[8] 8.266 8.470 8.608 8.738 8.967
One should note, however, that, while the central value and sampling StD
of beta are consistent with polr/glm results, the ratios SD/mean of the
thresholds are much smaller than with polr/glm, and this is reflected in
the credible intervals. Furthermore, the SDs do not seem to increase as
much with the (absolute value) of the central values. (Yes, I am aware
that centering my X values would reduce this phenomenon, but I kept them
uncentered to illustrate my point).
For example, the 0.95 confidence interval of the C1|C2 threshold is about
6.1 wide (about 41% of the mean) whereas the 0.95 credible interval of
gamma[1] is (1.36 1.70), 0.34 wide, which is about 25% of its mean. For
the C8|C9 thershold, the IC95/mean ratio is about 37% of the mean, while
the CrI95/mean of gamma[8] is 8.1%.
Given that the two models are respectively a frequentist and a Bayesian
of the very same probability model, this is perplexing.
Are there reasons to suspect that the SD reported by glm and/or polr are
inflated (for example, these model do not seem to account for the re-use
of the same data for the estimation of the various thresholds) ? Or is
there a (maybe not so subtle) error in the Bayesian model interpretation
in BUGS ?
Or am I barking up the wrong tree ?
Any idea would be welcome ...
Sincerely yours,
Emmanuel Charpentier
The artificial data :
Data <-
structure(list(X.real = c(3.49764437862872, 6.90659633160501,
2.49335390384994, 7.24293843863257, 0.803964727576124, 4.04213878877333,
1.06214939381180, 0.554054259416088, 4.08729099177371, 2.52629057633986,
3.24127380430347, 1.11891797458651, 0.530158326675324, 9.13663197362062,
4.50887509479721, 8.28526961997477, 4.88346597634514, 7.1687905770683,
8.30770739436844, 4.10970688126136, 0.926142793485848, 1.60153659527237,
7.9980129347903, 7.09517767824342, 3.13154316634710, 3.73947230927693,
2.80221854118421, 6.59984440492743, 2.61928305396566, 7.01599863310761,
2.24968126253923, 1.58829945203618, 1.90902984485251, 5.12920046748922,
9.2118070532461, 9.08208307216233, 4.43550132482185, 8.05729871351512,
1.18297174176127, 8.46910981252808, 4.43676303030525, 6.4418454794069,
4.53446029962874, 0.699769024165557, 8.46097165332483, 4.53799448354545,
5.27431575051651, 7.54335240307173, 2.17309956012568, 5.88721264347949,
9.4071993099453, 1.88089249201845, 5.3721556593878, 0.771888320064126,
8.1072438324685, 6.31264339382069, 2.00734973388480, 1.55494366180340,
7.02181982951623, 2.30599238507652, 0.527109325928194, 3.66541005952039,
5.5545317236498, 1.09950173374848, 0.589535171670769, 3.53540683843517,
4.45889129379049, 6.80170522713363, 7.37081649369139, 9.4889221652781,
7.19282466718394, 6.13643707306052, 1.35088176999453, 3.16405426710045,
3.32833977017296, 6.08074276163767, 1.63873091713528, 1.46557732912316,
4.91770214161358, 0.938513984685997, 1.48740958479596, 2.01597792723018,
4.21291073575364, 5.43887605123115, 4.70221174192061, 1.67668472528311,
5.24846045233372, 9.0111473276549, 7.23671006856042, 3.72080675801418,
4.39171014756293, 4.81283564404103, 2.41342165250038, 3.82321065255632,
7.669782577978, 9.2234752201292, 0.641411897606775, 1.13470612170815,
1.02966988829139, 1.24689684186557), Cat = structure(c(3L, 7L,
2L, 7L, 1L, 4L, 1L, 1L, 4L, 3L, 3L, 1L, 1L, 9L, 5L, 8L, 5L, 7L,
8L, 4L, 1L, 2L, 8L, 7L, 3L, 4L, 3L, 7L, 3L, 7L, 2L, 2L, 2L, 5L,
9L, 9L, 4L, 8L, 1L, 8L, 4L, 6L, 5L, 1L, 8L, 5L, 5L, 8L, 2L, 6L,
9L, 2L, 5L, 1L, 8L, 6L, 2L, 2L, 7L, 2L, 1L, 4L, 6L, 1L, 1L, 4L,
4L, 7L, 7L, 9L, 7L, 6L, 1L, 3L, 3L, 6L, 2L, 1L, 5L, 1L, 1L, 2L,
4L, 5L, 5L, 2L, 5L, 9L, 7L, 4L, 4L, 5L, 2L, 4L, 8L, 9L, 1L, 1L,
1L, 1L), .Label = c("C1", "C2", "C3", "C4", "C5", "C6", "C7",
"C8", "C9"), class = c("ordered", "factor")), X = c(3.78512255820666,
7.03710649735623, 2.37579547880536, 7.01687069036889, 0.572774665466921,
4.53333128338774, 0.92564311992818, 0.682074690286214, 4.17197303371935,
2.62951212025962, 3.44249595134830, 1.02688214798057, 0.440959503725193,
9.05785834195666, 4.52616407157106, 8.07448237191856, 4.95371782896727,
7.35730414103575, 8.07638469674326, 4.26407495806574, 1.06385684713162,
1.63333280184271, 7.9246178454581, 7.41541196525404, 3.12459209709868,
3.64101167776984, 2.51917629411974, 6.8118629850261, 2.77786400733997,
7.36566898280035, 2.03009770691198, 1.61723070290973, 1.96657910701987,
5.22600151472831, 9.21386030948403, 9.18902347324618, 4.39493834305473,
8.27266371630105, 1.15477068640133, 8.15280513569419, 4.6734050583675,
6.51664088906166, 4.40955269905174, 0.54646886424704, 8.03605092676131,
4.36779957410513, 5.44860695752003, 7.83429549417553, 1.84916430642593,
5.81109382387844, 9.32133297147632, 1.73276033539495, 5.3490390105937,
0.695781216499173, 8.05603266764859, 6.67741469458807, 2.03017543225991,
1.60754713124494, 6.63481861913185, 2.20004584719421, 0.675994410762504,
3.98953568167253, 5.45631429591027, 0.675175153754915, 0.305522085250977,
3.58440905189674, 4.58557708778191, 6.99911313460102, 7.21102458968007,
9.44650050070282, 7.33311596756764, 5.87485729872331, 1.11824133536435,
3.40701638829918, 3.36818213997361, 5.92742193489156, 2.08701427718401,
1.73709762230381, 4.71589786549998, 0.980248052739725, 1.58957599464140,
2.17701413093404, 4.22999685414555, 5.84947271423299, 4.94396458250251,
1.65133703600731, 5.17336780283038, 8.8993318875765, 7.29238547003421,
3.62020801752199, 4.28550112286315, 4.95543088867053, 2.61555021108721,
3.85711807608966, 7.67091954093986, 9.24609859359625, 0.436967673925767,
1.32063426377087, 0.794911931890139, 1.16453130413352)), .Names = c
("X.real",
"Cat", "X"), row.names = c(NA, -100L), class = "data.frame")
The R/BUGS Bayesian model :
system.time(Mod<-local({
Mod<-function(){ ## BUGS model as an R function. Easy to manage.
## Thresholds.
for (j in 1:ncut) {
gamma.raw[j]~dnorm(0, 1.0E-4)
}
gamma<-sort(gamma.raw)
## p[i,j]=Pr(Cat[i]>j)
## q[i,j]=Pr(Cat[i]=j)
## =Pr(Cat[i]>j-1)-Pr(Cat[i]>j)
## =p[i,j-1]-p[i,j]
for (i in 1:nobs) {
Cat[i]~dcat(q[i,1:(ncut+1)])
## Predictions
Cat.new[i]~dcat(q[i,1:(ncut+1)])
q[i,1]<-1-p[i,1]
for (j in 2:ncut) {
q[i,j]<-p[i,j-1]-p[i,j]
}
q[i,ncut+1]<-p[i,ncut]
for (j in 1:ncut) {
logit(p[i,j])<-beta.1*(X[i]-gamma[j])
}
}
beta.1~dnorm(0, 1.0E-4)
}
tmpf<-tempfile()
write.model(Mod, tmpf) ## Auxilliary function writing a function body
Mod.jags<-jags.model(tmpf,
data=with(Data, {
Y<-outer(Cat, levels(Cat)[-1], ">=")
list(X=X,
Cat=unclass(Cat),
nobs=nrow(Y),
ncut=ncol(Y))}),
## "Reasonable" (not too overdispersed) initial
## values are *CRUCIAL* to model initialisation
inits=function() {
list(gamma.raw=sort(runif(8,min=-1,max=1)),
beta.1=runif(1,min=0,max=1))
},
n.chains=3)
update(Mod.jags, n.iter=10000)
unlink(tmpf)
Mod.coda<-coda.samples(Mod.jags,
variable.names=c("gamma",
## "beta.0",
"beta.1",
## "gamma.adj",
## "lzeta",
## "pi.sgn",
"deviance",
"Cat.new"),
thin=10,
n.iter=10000)
list(Mod.jags=Mod.jags, Mod.coda=Mod.coda)
}))
More information about the R-sig-mixed-models
mailing list