[R-sig-ME] Mixed-model polytomous ordered logistic regression ?
Emmanuel Charpentier
emm.charpentier at free.fr
Mon Jan 3 22:53:59 CET 2011
Le lundi 03 janvier 2011 à 19:54 +0100, Rune Haubo a écrit :
> Hi Emmanuel,
>
> You can use the clmm() function in package ordinal to fit cumulative
> link mixed models which include the proportional odds mixed model that
> you mention. The non-mixed equivalent, clm() gives the same answers as
> polr() and glm() on your example data.
>
> clmm() currently only allows for one random intercept, but Ken
> Knoblauch proposed a polmer() function
> (http://finzi.psych.upenn.edu/R-sig-mixed-models/2010q2/003778.html)
> that seem to be an extension of glmer() similar to your glm() hack,
> but for mixed models.
That was my point, and Ken Kornblauch has been kind enough to send me
his code, which seems quite good. I have not yet worked on his code (and
probably won't for the next month at least), but it's probably a good
start.
The "800 degrees of freedom" problem still bother me. But that's another
problem (see below).
> This allows for several random effects in a
> glmer() fashion. On the other hand, with clmm() you get a variety of
> link functions and extensions to scale and nominal effects for
> modelling non-proportional effects, structured thresholds etc.
Thank you for the hint. I'll have a look at that. I have trouble
remembering which of the 2500+ R packages available on CRAN will solve
my problem-of-the-moment. Furthermore, I tend to trust more packages
doing something I've personally tinkered with. Hence my current
infatuation with BUGS (i. e. JAGS for the time being)...
But my point wasn't to solve a specific problem, but to point out that
ordinal data are a sufficiently frequent case to deserve "first-class
treatment". I am old enough to have been teached what was then called
"the general linear model", which was about the only case whee
multivariate model could be realistically fitted. Things tended to exa,d
a bit with censored data and the introduction of the Cox model, then to
Poisson and logistic regression. I tend to think af ordered data as
another kind of data that should be analysable the same way. That's what
polr() partially does (with a limited but wisely choosen set of
options).
Similarly, mixed models became *practically* fittable first for the
dreaded Gaussian case, then for Poisson, binomial and negative-binomial
cases. Ordered data should be analysable the same way.
Another case (not directly in my current interests, but one never
knows...) is categorical data. Log-linear models allow for a similar
treatment, and should be (too) analysable in the mixed models case.
A lot is to be learned from the Bayesian models for this kind of
problems. I'm currently exploring the (now somewhat outdated) Congdon's
textbooks on Bayesian modeling, and it raises interesting ideas bout
what should be doable from a frequentist point of view...
On the other hand, I'm more and more inclined to think that the Bayesian
point of view is extremely valid. The first difficulty is to convince
journals' editors that "p-value is crack" and that a posterior joint
distribution says all that can be learned about from an
experiment/observation, provided the priors are correctly chosen. The
second (and harder) is to find ways to express Bayesian probability
models in a more concise way than what is currently allowed by BUGS :
one should not have to type dozens of almost-identical snippets of code
for each and every intermediate parameter appearing in such model. My
current idea is tat some sort of a macro facility should help, but is
hard to design *correctly*. The third (and hardest) difficulty is to
implement that efficiently. I'm a bit stricken to see all "public" BUGS
implementations using only one of a 4- ou 8-CPU machine, even for parts
(model updating of parallel chains) that are "embarrassingly
parallel" ((C) D. Spiegelhalter & al.).
In our present case, the Bayesian answer "feels" more reasonable than
the answer given by polr/glm, in particular because the variability
estimation of the thresholds seems to avoid the artefact induced by a
non-centered independent variable...
> The idea of reformulating the cumulative link model as a binomial
> generalized linear model is good, but I haven't seen it described
> before - can I ask you which exercise in what edition of Gelman & Hill
> (2007) that mentions this?
Ex 15.1, section 15.5 of Gelman & Hill (2007), p. 342. The printer's
info at the beginning of the book (just after the full title page, but I
don't know how this page is called in American usage) says it's "First
published 2007//Reprinted with corrections 2007//10th printing 2009"
I am not aware of any other edition.
HTH,
Emmanuel Charpentier
> Cheers,
> Rune
>
> On 30 December 2010 23:27, Emmanuel Charpentier <emm.charpentier at free.fr> wrote:
> > Dear list,
> >
> > Following a hint of an exercise in Gelman & Hill's (2007) regression
> > textbook, I tried to understand how to build a function generalizing to
> > mixed models what polr() (MASS) does for fixed models.
> >
> > I started with a very simple artificial dataset (see at end). Using the
> > simplest polr() use :
> >
> >> summary(polr(Cat~X, data=Data))
> >
> > Re-fitting to get Hessian
> >
> > Call:
> > polr(formula = Cat ~ X, data = Data)
> >
> > Coefficients:
> > Value Std. Error t value
> > X 9.557 1.821 5.247
> >
> > Intercepts:
> > Value Std. Error t value
> > C1|C2 14.8668 3.0447 4.8828
> > C2|C3 24.2772 4.7119 5.1523
> > C3|C4 34.2367 6.5861 5.1984
> > C4|C5 43.3543 8.2390 5.2621
> > C5|C6 53.8174 10.3240 5.2128
> > C6|C7 63.5247 12.2352 5.1920
> > C7|C8 72.5850 13.7899 5.2636
> > C8|C9 82.2256 15.8144 5.1994
> >
> > Residual Deviance: 55.7395
> > AIC: 73.7395
> > Message d'avis :
> > glm.fit: fitted probabilities numerically 0 or 1 occurred
> >
> > the following snippet :
> >
> > Glm.polr<-local({
> > ll<-length(lev<-levels(Data$Cat))
> > thr<-paste(lev[-ll], "|", lev[-1], sep="")
> > Dataset<-do.call(rbind,
> > lapply(2:ll,
> > function(i) {
> > data.frame(Thr=thr[i-1],
> > Cat=Data$Cat>=lev[i],
> > Data[,names(Data)[!(names(Data) %in
> > % c("Thr", "Cat"))]])}))
> > Dataset$Thr<-factor(Dataset$Thr)
> > glm(Cat~0+Thr+X, data=Dataset, family=binomial(link=logit))
> > })
> >
> > made me able to reproduce my standard (up to numerical precision, with
> > which I didn't tinker, and a couple of cosmetic details (names and sign
> > of intercepts)), both for central value and variability :
> >
> >> summary(Glm.polr)
> >
> > Call:
> > glm(formula = Cat ~ 0 + Thr + X, family = binomial(link = logit),
> > data = Dataset)
> >
> > Deviance Residuals:
> > Min 1Q Median 3Q Max
> > -2.09822 -0.00001 0.00000 0.00000 1.92014
> >
> > Coefficients:
> > Estimate Std. Error z value Pr(>|z|)
> > ThrC1|C2 -14.858 3.050 -4.872 1.11e-06 ***
> > ThrC2|C3 -24.263 4.722 -5.138 2.78e-07 ***
> > ThrC3|C4 -34.217 6.600 -5.184 2.17e-07 ***
> > ThrC4|C5 -43.329 8.257 -5.247 1.54e-07 ***
> > ThrC5|C6 -53.786 10.346 -5.199 2.01e-07 ***
> > ThrC6|C7 -63.489 12.260 -5.179 2.24e-07 ***
> > ThrC7|C8 -72.542 13.820 -5.249 1.53e-07 ***
> > ThrC8|C9 -82.174 15.852 -5.184 2.18e-07 ***
> > X 9.551 1.825 5.233 1.67e-07 ***
> > ---
> > Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
> >
> > (Dispersion parameter for binomial family taken to be 1)
> >
> > Null deviance: 1109.035 on 800 degrees of freedom
> > Residual deviance: 55.737 on 791 degrees of freedom
> > AIC: 73.737
> >
> > Number of Fisher Scoring iterations: 12
> >
> > One should note that "large" thresholds (in absolute value) have larger
> > standard errors, which is probably an artifact, which I don't know how to
> > get rid of. The glm claim of "800 degrees of freedom" is ridiculous : I
> > have only 100 observations. More on this below...
> >
> > Now, to extend this to mixed models (using lme4's notation as model) is
> > not absolutely trivial :
> >
> > The "fixed effect" part of the formula must *not* have intercepts (that's
> > what the thresholds become) : how to ensure this ?
> >
> > In the "random effect" part, the intercepts can be replaced by the
> > thresholds (one can think of legitimate reasons for which thresholds
> > should depend on a random factor...). Should that be done also for
> > implicit thresholds (e. g. (X | Id) means, in a GLM (implicitely)
> > "variation of slope of X AND intercept according to Id". Should that
> > become "variation of slope AND threshold according to Id" ?).
> >
> > How should the tinkering with the original data be handled ?
> > Systematically copying the original dta can be heavy, but is safe. Can
> > the built-in "copy-on-write" mechanisms of R be sufficient ?
> >
> > What "other" models would be useful in such a function ? Cloglog ?
> > "Continuation" ? Others ?
> >
> > More generally, shouldn't such a function (polrer ?) be part of some
> > hypothetical (Platonician ?) lme4 ? Ordered data are important in many
> > cases where numerical expressions cannot be trusted, and the ability to
> > fit mixed models of these data would be important. But doing this on a
> > case by case basis is somewhat error-prone (especially when done by me :).
> >
> > For now on, I cope with BUGS and Bayesian interpretation. And hereby lies
> > a possibly important remark.
> >
> > My BUGS code (JAGS dialect, inspired by the "Bones" example of Classic
> > Bugs vol I) and R call (through rjags) is at end.
> >
> > The results call for some comments. Both graphical examination and some
> > formal inspection show good convergence (the predictions are somewhat
> > harder to stabilize...) :
> >
> >> gelman.diag(Mod$Mod.coda[,-(1:100)]) ## First 100 columns are
> > ## predicted values for the
> > ## original data.
> > Potential scale reduction factors:
> >
> > Point est. 97.5% quantile
> > beta.1 1.00 1.00
> > deviance 1.00 1.00
> > gamma[1] 1.00 1.00
> > gamma[2] 1.00 1.00
> > gamma[3] 1.00 1.01
> > gamma[4] 1.00 1.00
> > gamma[5] 1.00 1.00
> > gamma[6] 1.00 1.00
> > gamma[7] 1.00 1.00
> > gamma[8] 1.00 1.00
> >
> > Multivariate psrf
> >
> > 1.00
> >
> > The fitted data and their variability are cosmetically quite different
> > (the Bayesian model's gamma is the equivalent of the Threshold/X
> > coefficient of the polr/glm frequentist model), but their central value
> > quite consistent with previous values :
> >
> >> summary(Mod$Mod.coda[,-(1:100)])
> >
> > Iterations = 11010:21000
> > Thinning interval = 10
> > Number of chains = 3
> > Sample size per chain = 1000
> >
> > 1. Empirical mean and standard deviation for each variable,
> > plus standard error of the mean:
> >
> > Mean SD Naive SE Time-series SE
> > beta.1 8.927 1.68312 0.030729 0.027433
> > deviance 64.674 4.26684 0.077902 0.087455
> > gamma[1] 1.544 0.08815 0.001609 0.001649
> > gamma[2] 2.544 0.10957 0.002000 0.002065
> > gamma[3] 3.580 0.09930 0.001813 0.001708
> > gamma[4] 4.544 0.08445 0.001542 0.001607
> > gamma[5] 5.630 0.11998 0.002191 0.002137
> > gamma[6] 6.627 0.12406 0.002265 0.002463
> > gamma[7] 7.606 0.11175 0.002040 0.002019
> > gamma[8] 8.608 0.18558 0.003388 0.003343
> >
> > 2. Quantiles for each variable:
> >
> > 2.5% 25% 50% 75% 97.5%
> > beta.1 6.180 7.741 8.799 9.878 12.582
> > deviance 58.271 61.458 64.116 67.112 74.836
> > gamma[1] 1.363 1.489 1.548 1.604 1.706
> > gamma[2] 2.333 2.469 2.541 2.616 2.765
> > gamma[3] 3.386 3.512 3.582 3.647 3.779
> > gamma[4] 4.385 4.487 4.543 4.600 4.713
> > gamma[5] 5.406 5.545 5.630 5.713 5.870
> > gamma[6] 6.358 6.550 6.633 6.715 6.851
> > gamma[7] 7.392 7.529 7.603 7.682 7.821
> > gamma[8] 8.266 8.470 8.608 8.738 8.967
> >
> > One should note, however, that, while the central value and sampling StD
> > of beta are consistent with polr/glm results, the ratios SD/mean of the
> > thresholds are much smaller than with polr/glm, and this is reflected in
> > the credible intervals. Furthermore, the SDs do not seem to increase as
> > much with the (absolute value) of the central values. (Yes, I am aware
> > that centering my X values would reduce this phenomenon, but I kept them
> > uncentered to illustrate my point).
> >
> > For example, the 0.95 confidence interval of the C1|C2 threshold is about
> > 6.1 wide (about 41% of the mean) whereas the 0.95 credible interval of
> > gamma[1] is (1.36 1.70), 0.34 wide, which is about 25% of its mean. For
> > the C8|C9 thershold, the IC95/mean ratio is about 37% of the mean, while
> > the CrI95/mean of gamma[8] is 8.1%.
> >
> > Given that the two models are respectively a frequentist and a Bayesian
> > of the very same probability model, this is perplexing.
> >
> > Are there reasons to suspect that the SD reported by glm and/or polr are
> > inflated (for example, these model do not seem to account for the re-use
> > of the same data for the estimation of the various thresholds) ? Or is
> > there a (maybe not so subtle) error in the Bayesian model interpretation
> > in BUGS ?
> >
> > Or am I barking up the wrong tree ?
> >
> > Any idea would be welcome ...
> >
> > Sincerely yours,
> >
> > Emmanuel Charpentier
> >
> > The artificial data :
> > Data <-
> > structure(list(X.real = c(3.49764437862872, 6.90659633160501,
> > 2.49335390384994, 7.24293843863257, 0.803964727576124, 4.04213878877333,
> > 1.06214939381180, 0.554054259416088, 4.08729099177371, 2.52629057633986,
> > 3.24127380430347, 1.11891797458651, 0.530158326675324, 9.13663197362062,
> > 4.50887509479721, 8.28526961997477, 4.88346597634514, 7.1687905770683,
> > 8.30770739436844, 4.10970688126136, 0.926142793485848, 1.60153659527237,
> > 7.9980129347903, 7.09517767824342, 3.13154316634710, 3.73947230927693,
> > 2.80221854118421, 6.59984440492743, 2.61928305396566, 7.01599863310761,
> > 2.24968126253923, 1.58829945203618, 1.90902984485251, 5.12920046748922,
> > 9.2118070532461, 9.08208307216233, 4.43550132482185, 8.05729871351512,
> > 1.18297174176127, 8.46910981252808, 4.43676303030525, 6.4418454794069,
> > 4.53446029962874, 0.699769024165557, 8.46097165332483, 4.53799448354545,
> > 5.27431575051651, 7.54335240307173, 2.17309956012568, 5.88721264347949,
> > 9.4071993099453, 1.88089249201845, 5.3721556593878, 0.771888320064126,
> > 8.1072438324685, 6.31264339382069, 2.00734973388480, 1.55494366180340,
> > 7.02181982951623, 2.30599238507652, 0.527109325928194, 3.66541005952039,
> > 5.5545317236498, 1.09950173374848, 0.589535171670769, 3.53540683843517,
> > 4.45889129379049, 6.80170522713363, 7.37081649369139, 9.4889221652781,
> > 7.19282466718394, 6.13643707306052, 1.35088176999453, 3.16405426710045,
> > 3.32833977017296, 6.08074276163767, 1.63873091713528, 1.46557732912316,
> > 4.91770214161358, 0.938513984685997, 1.48740958479596, 2.01597792723018,
> > 4.21291073575364, 5.43887605123115, 4.70221174192061, 1.67668472528311,
> > 5.24846045233372, 9.0111473276549, 7.23671006856042, 3.72080675801418,
> > 4.39171014756293, 4.81283564404103, 2.41342165250038, 3.82321065255632,
> > 7.669782577978, 9.2234752201292, 0.641411897606775, 1.13470612170815,
> > 1.02966988829139, 1.24689684186557), Cat = structure(c(3L, 7L,
> > 2L, 7L, 1L, 4L, 1L, 1L, 4L, 3L, 3L, 1L, 1L, 9L, 5L, 8L, 5L, 7L,
> > 8L, 4L, 1L, 2L, 8L, 7L, 3L, 4L, 3L, 7L, 3L, 7L, 2L, 2L, 2L, 5L,
> > 9L, 9L, 4L, 8L, 1L, 8L, 4L, 6L, 5L, 1L, 8L, 5L, 5L, 8L, 2L, 6L,
> > 9L, 2L, 5L, 1L, 8L, 6L, 2L, 2L, 7L, 2L, 1L, 4L, 6L, 1L, 1L, 4L,
> > 4L, 7L, 7L, 9L, 7L, 6L, 1L, 3L, 3L, 6L, 2L, 1L, 5L, 1L, 1L, 2L,
> > 4L, 5L, 5L, 2L, 5L, 9L, 7L, 4L, 4L, 5L, 2L, 4L, 8L, 9L, 1L, 1L,
> > 1L, 1L), .Label = c("C1", "C2", "C3", "C4", "C5", "C6", "C7",
> > "C8", "C9"), class = c("ordered", "factor")), X = c(3.78512255820666,
> > 7.03710649735623, 2.37579547880536, 7.01687069036889, 0.572774665466921,
> > 4.53333128338774, 0.92564311992818, 0.682074690286214, 4.17197303371935,
> > 2.62951212025962, 3.44249595134830, 1.02688214798057, 0.440959503725193,
> > 9.05785834195666, 4.52616407157106, 8.07448237191856, 4.95371782896727,
> > 7.35730414103575, 8.07638469674326, 4.26407495806574, 1.06385684713162,
> > 1.63333280184271, 7.9246178454581, 7.41541196525404, 3.12459209709868,
> > 3.64101167776984, 2.51917629411974, 6.8118629850261, 2.77786400733997,
> > 7.36566898280035, 2.03009770691198, 1.61723070290973, 1.96657910701987,
> > 5.22600151472831, 9.21386030948403, 9.18902347324618, 4.39493834305473,
> > 8.27266371630105, 1.15477068640133, 8.15280513569419, 4.6734050583675,
> > 6.51664088906166, 4.40955269905174, 0.54646886424704, 8.03605092676131,
> > 4.36779957410513, 5.44860695752003, 7.83429549417553, 1.84916430642593,
> > 5.81109382387844, 9.32133297147632, 1.73276033539495, 5.3490390105937,
> > 0.695781216499173, 8.05603266764859, 6.67741469458807, 2.03017543225991,
> > 1.60754713124494, 6.63481861913185, 2.20004584719421, 0.675994410762504,
> > 3.98953568167253, 5.45631429591027, 0.675175153754915, 0.305522085250977,
> > 3.58440905189674, 4.58557708778191, 6.99911313460102, 7.21102458968007,
> > 9.44650050070282, 7.33311596756764, 5.87485729872331, 1.11824133536435,
> > 3.40701638829918, 3.36818213997361, 5.92742193489156, 2.08701427718401,
> > 1.73709762230381, 4.71589786549998, 0.980248052739725, 1.58957599464140,
> > 2.17701413093404, 4.22999685414555, 5.84947271423299, 4.94396458250251,
> > 1.65133703600731, 5.17336780283038, 8.8993318875765, 7.29238547003421,
> > 3.62020801752199, 4.28550112286315, 4.95543088867053, 2.61555021108721,
> > 3.85711807608966, 7.67091954093986, 9.24609859359625, 0.436967673925767,
> > 1.32063426377087, 0.794911931890139, 1.16453130413352)), .Names = c
> > ("X.real",
> > "Cat", "X"), row.names = c(NA, -100L), class = "data.frame")
> >
> > The R/BUGS Bayesian model :
> >
> > system.time(Mod<-local({
> > Mod<-function(){ ## BUGS model as an R function. Easy to manage.
> > ## Thresholds.
> > for (j in 1:ncut) {
> > gamma.raw[j]~dnorm(0, 1.0E-4)
> > }
> > gamma<-sort(gamma.raw)
> > ## p[i,j]=Pr(Cat[i]>j)
> > ## q[i,j]=Pr(Cat[i]=j)
> > ## =Pr(Cat[i]>j-1)-Pr(Cat[i]>j)
> > ## =p[i,j-1]-p[i,j]
> > for (i in 1:nobs) {
> > Cat[i]~dcat(q[i,1:(ncut+1)])
> > ## Predictions
> > Cat.new[i]~dcat(q[i,1:(ncut+1)])
> > q[i,1]<-1-p[i,1]
> > for (j in 2:ncut) {
> > q[i,j]<-p[i,j-1]-p[i,j]
> > }
> > q[i,ncut+1]<-p[i,ncut]
> > for (j in 1:ncut) {
> > logit(p[i,j])<-beta.1*(X[i]-gamma[j])
> > }
> > }
> > beta.1~dnorm(0, 1.0E-4)
> > }
> > tmpf<-tempfile()
> > write.model(Mod, tmpf) ## Auxilliary function writing a function body
> > Mod.jags<-jags.model(tmpf,
> > data=with(Data, {
> > Y<-outer(Cat, levels(Cat)[-1], ">=")
> > list(X=X,
> > Cat=unclass(Cat),
> > nobs=nrow(Y),
> > ncut=ncol(Y))}),
> > ## "Reasonable" (not too overdispersed) initial
> > ## values are *CRUCIAL* to model initialisation
> > inits=function() {
> > list(gamma.raw=sort(runif(8,min=-1,max=1)),
> > beta.1=runif(1,min=0,max=1))
> > },
> > n.chains=3)
> > update(Mod.jags, n.iter=10000)
> > unlink(tmpf)
> > Mod.coda<-coda.samples(Mod.jags,
> > variable.names=c("gamma",
> > ## "beta.0",
> > "beta.1",
> > ## "gamma.adj",
> > ## "lzeta",
> > ## "pi.sgn",
> > "deviance",
> > "Cat.new"),
> > thin=10,
> > n.iter=10000)
> > list(Mod.jags=Mod.jags, Mod.coda=Mod.coda)
> > }))
> >
> > _______________________________________________
> > R-sig-mixed-models at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
>
>
>
More information about the R-sig-mixed-models
mailing list