[R-sig-ME] Mixed-model polytomous ordered logistic regression ?
Rune Haubo
rhbc at imm.dtu.dk
Mon Jan 3 19:54:19 CET 2011
Hi Emmanuel,
You can use the clmm() function in package ordinal to fit cumulative
link mixed models which include the proportional odds mixed model that
you mention. The non-mixed equivalent, clm() gives the same answers as
polr() and glm() on your example data.
clmm() currently only allows for one random intercept, but Ken
Knoblauch proposed a polmer() function
(http://finzi.psych.upenn.edu/R-sig-mixed-models/2010q2/003778.html)
that seem to be an extension of glmer() similar to your glm() hack,
but for mixed models. This allows for several random effects in a
glmer() fashion. On the other hand, with clmm() you get a variety of
link functions and extensions to scale and nominal effects for
modelling non-proportional effects, structured thresholds etc.
The idea of reformulating the cumulative link model as a binomial
generalized linear model is good, but I haven't seen it described
before - can I ask you which exercise in what edition of Gelman & Hill
(2007) that mentions this?
Cheers,
Rune
On 30 December 2010 23:27, Emmanuel Charpentier <emm.charpentier at free.fr> wrote:
> Dear list,
>
> Following a hint of an exercise in Gelman & Hill's (2007) regression
> textbook, I tried to understand how to build a function generalizing to
> mixed models what polr() (MASS) does for fixed models.
>
> I started with a very simple artificial dataset (see at end). Using the
> simplest polr() use :
>
>> summary(polr(Cat~X, data=Data))
>
> Re-fitting to get Hessian
>
> Call:
> polr(formula = Cat ~ X, data = Data)
>
> Coefficients:
> Value Std. Error t value
> X 9.557 1.821 5.247
>
> Intercepts:
> Value Std. Error t value
> C1|C2 14.8668 3.0447 4.8828
> C2|C3 24.2772 4.7119 5.1523
> C3|C4 34.2367 6.5861 5.1984
> C4|C5 43.3543 8.2390 5.2621
> C5|C6 53.8174 10.3240 5.2128
> C6|C7 63.5247 12.2352 5.1920
> C7|C8 72.5850 13.7899 5.2636
> C8|C9 82.2256 15.8144 5.1994
>
> Residual Deviance: 55.7395
> AIC: 73.7395
> Message d'avis :
> glm.fit: fitted probabilities numerically 0 or 1 occurred
>
> the following snippet :
>
> Glm.polr<-local({
> ll<-length(lev<-levels(Data$Cat))
> thr<-paste(lev[-ll], "|", lev[-1], sep="")
> Dataset<-do.call(rbind,
> lapply(2:ll,
> function(i) {
> data.frame(Thr=thr[i-1],
> Cat=Data$Cat>=lev[i],
> Data[,names(Data)[!(names(Data) %in
> % c("Thr", "Cat"))]])}))
> Dataset$Thr<-factor(Dataset$Thr)
> glm(Cat~0+Thr+X, data=Dataset, family=binomial(link=logit))
> })
>
> made me able to reproduce my standard (up to numerical precision, with
> which I didn't tinker, and a couple of cosmetic details (names and sign
> of intercepts)), both for central value and variability :
>
>> summary(Glm.polr)
>
> Call:
> glm(formula = Cat ~ 0 + Thr + X, family = binomial(link = logit),
> data = Dataset)
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -2.09822 -0.00001 0.00000 0.00000 1.92014
>
> Coefficients:
> Estimate Std. Error z value Pr(>|z|)
> ThrC1|C2 -14.858 3.050 -4.872 1.11e-06 ***
> ThrC2|C3 -24.263 4.722 -5.138 2.78e-07 ***
> ThrC3|C4 -34.217 6.600 -5.184 2.17e-07 ***
> ThrC4|C5 -43.329 8.257 -5.247 1.54e-07 ***
> ThrC5|C6 -53.786 10.346 -5.199 2.01e-07 ***
> ThrC6|C7 -63.489 12.260 -5.179 2.24e-07 ***
> ThrC7|C8 -72.542 13.820 -5.249 1.53e-07 ***
> ThrC8|C9 -82.174 15.852 -5.184 2.18e-07 ***
> X 9.551 1.825 5.233 1.67e-07 ***
> ---
> Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
>
> (Dispersion parameter for binomial family taken to be 1)
>
> Null deviance: 1109.035 on 800 degrees of freedom
> Residual deviance: 55.737 on 791 degrees of freedom
> AIC: 73.737
>
> Number of Fisher Scoring iterations: 12
>
> One should note that "large" thresholds (in absolute value) have larger
> standard errors, which is probably an artifact, which I don't know how to
> get rid of. The glm claim of "800 degrees of freedom" is ridiculous : I
> have only 100 observations. More on this below...
>
> Now, to extend this to mixed models (using lme4's notation as model) is
> not absolutely trivial :
>
> The "fixed effect" part of the formula must *not* have intercepts (that's
> what the thresholds become) : how to ensure this ?
>
> In the "random effect" part, the intercepts can be replaced by the
> thresholds (one can think of legitimate reasons for which thresholds
> should depend on a random factor...). Should that be done also for
> implicit thresholds (e. g. (X | Id) means, in a GLM (implicitely)
> "variation of slope of X AND intercept according to Id". Should that
> become "variation of slope AND threshold according to Id" ?).
>
> How should the tinkering with the original data be handled ?
> Systematically copying the original dta can be heavy, but is safe. Can
> the built-in "copy-on-write" mechanisms of R be sufficient ?
>
> What "other" models would be useful in such a function ? Cloglog ?
> "Continuation" ? Others ?
>
> More generally, shouldn't such a function (polrer ?) be part of some
> hypothetical (Platonician ?) lme4 ? Ordered data are important in many
> cases where numerical expressions cannot be trusted, and the ability to
> fit mixed models of these data would be important. But doing this on a
> case by case basis is somewhat error-prone (especially when done by me :).
>
> For now on, I cope with BUGS and Bayesian interpretation. And hereby lies
> a possibly important remark.
>
> My BUGS code (JAGS dialect, inspired by the "Bones" example of Classic
> Bugs vol I) and R call (through rjags) is at end.
>
> The results call for some comments. Both graphical examination and some
> formal inspection show good convergence (the predictions are somewhat
> harder to stabilize...) :
>
>> gelman.diag(Mod$Mod.coda[,-(1:100)]) ## First 100 columns are
> ## predicted values for the
> ## original data.
> Potential scale reduction factors:
>
> Point est. 97.5% quantile
> beta.1 1.00 1.00
> deviance 1.00 1.00
> gamma[1] 1.00 1.00
> gamma[2] 1.00 1.00
> gamma[3] 1.00 1.01
> gamma[4] 1.00 1.00
> gamma[5] 1.00 1.00
> gamma[6] 1.00 1.00
> gamma[7] 1.00 1.00
> gamma[8] 1.00 1.00
>
> Multivariate psrf
>
> 1.00
>
> The fitted data and their variability are cosmetically quite different
> (the Bayesian model's gamma is the equivalent of the Threshold/X
> coefficient of the polr/glm frequentist model), but their central value
> quite consistent with previous values :
>
>> summary(Mod$Mod.coda[,-(1:100)])
>
> Iterations = 11010:21000
> Thinning interval = 10
> Number of chains = 3
> Sample size per chain = 1000
>
> 1. Empirical mean and standard deviation for each variable,
> plus standard error of the mean:
>
> Mean SD Naive SE Time-series SE
> beta.1 8.927 1.68312 0.030729 0.027433
> deviance 64.674 4.26684 0.077902 0.087455
> gamma[1] 1.544 0.08815 0.001609 0.001649
> gamma[2] 2.544 0.10957 0.002000 0.002065
> gamma[3] 3.580 0.09930 0.001813 0.001708
> gamma[4] 4.544 0.08445 0.001542 0.001607
> gamma[5] 5.630 0.11998 0.002191 0.002137
> gamma[6] 6.627 0.12406 0.002265 0.002463
> gamma[7] 7.606 0.11175 0.002040 0.002019
> gamma[8] 8.608 0.18558 0.003388 0.003343
>
> 2. Quantiles for each variable:
>
> 2.5% 25% 50% 75% 97.5%
> beta.1 6.180 7.741 8.799 9.878 12.582
> deviance 58.271 61.458 64.116 67.112 74.836
> gamma[1] 1.363 1.489 1.548 1.604 1.706
> gamma[2] 2.333 2.469 2.541 2.616 2.765
> gamma[3] 3.386 3.512 3.582 3.647 3.779
> gamma[4] 4.385 4.487 4.543 4.600 4.713
> gamma[5] 5.406 5.545 5.630 5.713 5.870
> gamma[6] 6.358 6.550 6.633 6.715 6.851
> gamma[7] 7.392 7.529 7.603 7.682 7.821
> gamma[8] 8.266 8.470 8.608 8.738 8.967
>
> One should note, however, that, while the central value and sampling StD
> of beta are consistent with polr/glm results, the ratios SD/mean of the
> thresholds are much smaller than with polr/glm, and this is reflected in
> the credible intervals. Furthermore, the SDs do not seem to increase as
> much with the (absolute value) of the central values. (Yes, I am aware
> that centering my X values would reduce this phenomenon, but I kept them
> uncentered to illustrate my point).
>
> For example, the 0.95 confidence interval of the C1|C2 threshold is about
> 6.1 wide (about 41% of the mean) whereas the 0.95 credible interval of
> gamma[1] is (1.36 1.70), 0.34 wide, which is about 25% of its mean. For
> the C8|C9 thershold, the IC95/mean ratio is about 37% of the mean, while
> the CrI95/mean of gamma[8] is 8.1%.
>
> Given that the two models are respectively a frequentist and a Bayesian
> of the very same probability model, this is perplexing.
>
> Are there reasons to suspect that the SD reported by glm and/or polr are
> inflated (for example, these model do not seem to account for the re-use
> of the same data for the estimation of the various thresholds) ? Or is
> there a (maybe not so subtle) error in the Bayesian model interpretation
> in BUGS ?
>
> Or am I barking up the wrong tree ?
>
> Any idea would be welcome ...
>
> Sincerely yours,
>
> Emmanuel Charpentier
>
> The artificial data :
> Data <-
> structure(list(X.real = c(3.49764437862872, 6.90659633160501,
> 2.49335390384994, 7.24293843863257, 0.803964727576124, 4.04213878877333,
> 1.06214939381180, 0.554054259416088, 4.08729099177371, 2.52629057633986,
> 3.24127380430347, 1.11891797458651, 0.530158326675324, 9.13663197362062,
> 4.50887509479721, 8.28526961997477, 4.88346597634514, 7.1687905770683,
> 8.30770739436844, 4.10970688126136, 0.926142793485848, 1.60153659527237,
> 7.9980129347903, 7.09517767824342, 3.13154316634710, 3.73947230927693,
> 2.80221854118421, 6.59984440492743, 2.61928305396566, 7.01599863310761,
> 2.24968126253923, 1.58829945203618, 1.90902984485251, 5.12920046748922,
> 9.2118070532461, 9.08208307216233, 4.43550132482185, 8.05729871351512,
> 1.18297174176127, 8.46910981252808, 4.43676303030525, 6.4418454794069,
> 4.53446029962874, 0.699769024165557, 8.46097165332483, 4.53799448354545,
> 5.27431575051651, 7.54335240307173, 2.17309956012568, 5.88721264347949,
> 9.4071993099453, 1.88089249201845, 5.3721556593878, 0.771888320064126,
> 8.1072438324685, 6.31264339382069, 2.00734973388480, 1.55494366180340,
> 7.02181982951623, 2.30599238507652, 0.527109325928194, 3.66541005952039,
> 5.5545317236498, 1.09950173374848, 0.589535171670769, 3.53540683843517,
> 4.45889129379049, 6.80170522713363, 7.37081649369139, 9.4889221652781,
> 7.19282466718394, 6.13643707306052, 1.35088176999453, 3.16405426710045,
> 3.32833977017296, 6.08074276163767, 1.63873091713528, 1.46557732912316,
> 4.91770214161358, 0.938513984685997, 1.48740958479596, 2.01597792723018,
> 4.21291073575364, 5.43887605123115, 4.70221174192061, 1.67668472528311,
> 5.24846045233372, 9.0111473276549, 7.23671006856042, 3.72080675801418,
> 4.39171014756293, 4.81283564404103, 2.41342165250038, 3.82321065255632,
> 7.669782577978, 9.2234752201292, 0.641411897606775, 1.13470612170815,
> 1.02966988829139, 1.24689684186557), Cat = structure(c(3L, 7L,
> 2L, 7L, 1L, 4L, 1L, 1L, 4L, 3L, 3L, 1L, 1L, 9L, 5L, 8L, 5L, 7L,
> 8L, 4L, 1L, 2L, 8L, 7L, 3L, 4L, 3L, 7L, 3L, 7L, 2L, 2L, 2L, 5L,
> 9L, 9L, 4L, 8L, 1L, 8L, 4L, 6L, 5L, 1L, 8L, 5L, 5L, 8L, 2L, 6L,
> 9L, 2L, 5L, 1L, 8L, 6L, 2L, 2L, 7L, 2L, 1L, 4L, 6L, 1L, 1L, 4L,
> 4L, 7L, 7L, 9L, 7L, 6L, 1L, 3L, 3L, 6L, 2L, 1L, 5L, 1L, 1L, 2L,
> 4L, 5L, 5L, 2L, 5L, 9L, 7L, 4L, 4L, 5L, 2L, 4L, 8L, 9L, 1L, 1L,
> 1L, 1L), .Label = c("C1", "C2", "C3", "C4", "C5", "C6", "C7",
> "C8", "C9"), class = c("ordered", "factor")), X = c(3.78512255820666,
> 7.03710649735623, 2.37579547880536, 7.01687069036889, 0.572774665466921,
> 4.53333128338774, 0.92564311992818, 0.682074690286214, 4.17197303371935,
> 2.62951212025962, 3.44249595134830, 1.02688214798057, 0.440959503725193,
> 9.05785834195666, 4.52616407157106, 8.07448237191856, 4.95371782896727,
> 7.35730414103575, 8.07638469674326, 4.26407495806574, 1.06385684713162,
> 1.63333280184271, 7.9246178454581, 7.41541196525404, 3.12459209709868,
> 3.64101167776984, 2.51917629411974, 6.8118629850261, 2.77786400733997,
> 7.36566898280035, 2.03009770691198, 1.61723070290973, 1.96657910701987,
> 5.22600151472831, 9.21386030948403, 9.18902347324618, 4.39493834305473,
> 8.27266371630105, 1.15477068640133, 8.15280513569419, 4.6734050583675,
> 6.51664088906166, 4.40955269905174, 0.54646886424704, 8.03605092676131,
> 4.36779957410513, 5.44860695752003, 7.83429549417553, 1.84916430642593,
> 5.81109382387844, 9.32133297147632, 1.73276033539495, 5.3490390105937,
> 0.695781216499173, 8.05603266764859, 6.67741469458807, 2.03017543225991,
> 1.60754713124494, 6.63481861913185, 2.20004584719421, 0.675994410762504,
> 3.98953568167253, 5.45631429591027, 0.675175153754915, 0.305522085250977,
> 3.58440905189674, 4.58557708778191, 6.99911313460102, 7.21102458968007,
> 9.44650050070282, 7.33311596756764, 5.87485729872331, 1.11824133536435,
> 3.40701638829918, 3.36818213997361, 5.92742193489156, 2.08701427718401,
> 1.73709762230381, 4.71589786549998, 0.980248052739725, 1.58957599464140,
> 2.17701413093404, 4.22999685414555, 5.84947271423299, 4.94396458250251,
> 1.65133703600731, 5.17336780283038, 8.8993318875765, 7.29238547003421,
> 3.62020801752199, 4.28550112286315, 4.95543088867053, 2.61555021108721,
> 3.85711807608966, 7.67091954093986, 9.24609859359625, 0.436967673925767,
> 1.32063426377087, 0.794911931890139, 1.16453130413352)), .Names = c
> ("X.real",
> "Cat", "X"), row.names = c(NA, -100L), class = "data.frame")
>
> The R/BUGS Bayesian model :
>
> system.time(Mod<-local({
> Mod<-function(){ ## BUGS model as an R function. Easy to manage.
> ## Thresholds.
> for (j in 1:ncut) {
> gamma.raw[j]~dnorm(0, 1.0E-4)
> }
> gamma<-sort(gamma.raw)
> ## p[i,j]=Pr(Cat[i]>j)
> ## q[i,j]=Pr(Cat[i]=j)
> ## =Pr(Cat[i]>j-1)-Pr(Cat[i]>j)
> ## =p[i,j-1]-p[i,j]
> for (i in 1:nobs) {
> Cat[i]~dcat(q[i,1:(ncut+1)])
> ## Predictions
> Cat.new[i]~dcat(q[i,1:(ncut+1)])
> q[i,1]<-1-p[i,1]
> for (j in 2:ncut) {
> q[i,j]<-p[i,j-1]-p[i,j]
> }
> q[i,ncut+1]<-p[i,ncut]
> for (j in 1:ncut) {
> logit(p[i,j])<-beta.1*(X[i]-gamma[j])
> }
> }
> beta.1~dnorm(0, 1.0E-4)
> }
> tmpf<-tempfile()
> write.model(Mod, tmpf) ## Auxilliary function writing a function body
> Mod.jags<-jags.model(tmpf,
> data=with(Data, {
> Y<-outer(Cat, levels(Cat)[-1], ">=")
> list(X=X,
> Cat=unclass(Cat),
> nobs=nrow(Y),
> ncut=ncol(Y))}),
> ## "Reasonable" (not too overdispersed) initial
> ## values are *CRUCIAL* to model initialisation
> inits=function() {
> list(gamma.raw=sort(runif(8,min=-1,max=1)),
> beta.1=runif(1,min=0,max=1))
> },
> n.chains=3)
> update(Mod.jags, n.iter=10000)
> unlink(tmpf)
> Mod.coda<-coda.samples(Mod.jags,
> variable.names=c("gamma",
> ## "beta.0",
> "beta.1",
> ## "gamma.adj",
> ## "lzeta",
> ## "pi.sgn",
> "deviance",
> "Cat.new"),
> thin=10,
> n.iter=10000)
> list(Mod.jags=Mod.jags, Mod.coda=Mod.coda)
> }))
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
--
Rune Haubo Bojesen Christensen
PhD Student, M.Sc. Eng.
Phone: (+45) 45 25 33 63
Mobile: (+45) 30 26 45 54
DTU Informatics, Section for Statistics
Technical University of Denmark, Build. 305, Room 122,
DK-2800 Kgs. Lyngby, Denmark
More information about the R-sig-mixed-models
mailing list