[R-sig-ME] Mixed-model polytomous ordered logistic regression ?

Thu Dec 30 23:27:41 CET 2010

Dear list,

Following a hint of an exercise in Gelman & Hill's (2007) regression 
textbook, I tried to understand  how to build a function generalizing to 
mixed models what polr() (MASS) does for fixed models.

I started with a very simple artificial dataset (see at end). Using the 
simplest polr() use :

> summary(polr(Cat~X, data=Data))

Re-fitting to get Hessian

Call:
polr(formula = Cat ~ X, data = Data)

Coefficients:
  Value Std. Error t value
X 9.557      1.821   5.247

Intercepts:
      Value   Std. Error t value
C1|C2 14.8668  3.0447     4.8828
C2|C3 24.2772  4.7119     5.1523
C3|C4 34.2367  6.5861     5.1984
C4|C5 43.3543  8.2390     5.2621
C5|C6 53.8174 10.3240     5.2128
C6|C7 63.5247 12.2352     5.1920
C7|C8 72.5850 13.7899     5.2636
C8|C9 82.2256 15.8144     5.1994

Residual Deviance: 55.7395 
AIC: 73.7395 
Message d'avis :
glm.fit: fitted probabilities numerically 0 or 1 occurred 

the following snippet :

Glm.polr<-local({
  ll<-length(lev<-levels(Data$Cat))
  thr<-paste(lev[-ll], "|", lev[-1], sep="")
  Dataset<-do.call(rbind,
                   lapply(2:ll,
                          function(i) {
                            data.frame(Thr=thr[i-1],
                                       Cat=Data$Cat>=lev[i],
                                       Data[,names(Data)[!(names(Data) %in
% c("Thr", "Cat"))]])}))
  Dataset$Thr<-factor(Dataset$Thr)
  glm(Cat~0+Thr+X, data=Dataset, family=binomial(link=logit))
})

made me able to reproduce my standard (up to numerical precision, with 
which I didn't tinker, and a couple of cosmetic details (names and sign 
of intercepts)), both for central value and variability :

> summary(Glm.polr)

Call:
glm(formula = Cat ~ 0 + Thr + X, family = binomial(link = logit), 
    data = Dataset)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.09822  -0.00001   0.00000   0.00000   1.92014  

Coefficients:
         Estimate Std. Error z value Pr(>|z|)    
ThrC1|C2  -14.858      3.050  -4.872 1.11e-06 ***
ThrC2|C3  -24.263      4.722  -5.138 2.78e-07 ***
ThrC3|C4  -34.217      6.600  -5.184 2.17e-07 ***
ThrC4|C5  -43.329      8.257  -5.247 1.54e-07 ***
ThrC5|C6  -53.786     10.346  -5.199 2.01e-07 ***
ThrC6|C7  -63.489     12.260  -5.179 2.24e-07 ***
ThrC7|C8  -72.542     13.820  -5.249 1.53e-07 ***
ThrC8|C9  -82.174     15.852  -5.184 2.18e-07 ***
X           9.551      1.825   5.233 1.67e-07 ***
---
Signif. codes:  0  ***  0.001  **  0.01  *  0.05  .  0.1     1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1109.035  on 800  degrees of freedom
Residual deviance:   55.737  on 791  degrees of freedom
AIC: 73.737

Number of Fisher Scoring iterations: 12

One should note that "large" thresholds (in absolute value) have larger 
standard errors, which is probably an artifact, which I don't know how to 
get rid of. The glm claim of "800 degrees of freedom" is ridiculous : I 
have only 100 observations. More on this below...

Now, to extend this to mixed models (using lme4's notation as  model) is 
not absolutely trivial :

The "fixed effect" part of the formula must *not* have intercepts (that's 
what the thresholds become) : how to ensure this ?

In the "random effect" part, the intercepts can be replaced by the 
thresholds (one can think of legitimate reasons for which thresholds 
should depend on a random factor...). Should that be done also for 
implicit thresholds (e. g. (X | Id) means, in a GLM (implicitely) 
"variation of slope of X AND intercept according to Id". Should that 
become "variation of slope AND threshold according to Id" ?).

How should the tinkering with the original data be handled ? 
Systematically copying the original dta can be heavy, but is safe. Can 
the built-in "copy-on-write" mechanisms of R be sufficient ?

What "other" models would be useful in such a function ? Cloglog ? 
"Continuation" ? Others ?

More generally, shouldn't such a function (polrer ?) be part of some 
hypothetical (Platonician ?) lme4 ? Ordered data are important in many 
cases where numerical expressions cannot be trusted, and the ability to 
fit mixed models of these data would be important. But doing this on a 
case by case basis is somewhat error-prone (especially when done by me :).

For now on, I cope with BUGS and Bayesian interpretation. And hereby lies 
a possibly important remark.

My BUGS code (JAGS dialect, inspired by the "Bones" example of Classic 
Bugs vol I) and R call (through rjags) is at end.

The results call for some comments. Both graphical examination and some 
formal inspection show good convergence (the predictions are somewhat 
harder to stabilize...) :

> gelman.diag(Mod$Mod.coda[,-(1:100)]) ## First 100 columns are 
                                       ## predicted values for the 
                                       ## original data.
Potential scale reduction factors:

         Point est. 97.5% quantile
beta.1         1.00           1.00
deviance       1.00           1.00
gamma[1]       1.00           1.00
gamma[2]       1.00           1.00
gamma[3]       1.00           1.01
gamma[4]       1.00           1.00
gamma[5]       1.00           1.00
gamma[6]       1.00           1.00
gamma[7]       1.00           1.00
gamma[8]       1.00           1.00

Multivariate psrf

1.00

The fitted data and their variability are cosmetically quite different 
(the Bayesian model's gamma is the equivalent of the Threshold/X 
coefficient of the polr/glm frequentist model), but their central value 
quite consistent with previous values :

> summary(Mod$Mod.coda[,-(1:100)]) 

Iterations = 11010:21000
Thinning interval = 10 
Number of chains = 3 
Sample size per chain = 1000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

           Mean      SD Naive SE Time-series SE
beta.1    8.927 1.68312 0.030729       0.027433
deviance 64.674 4.26684 0.077902       0.087455
gamma[1]  1.544 0.08815 0.001609       0.001649
gamma[2]  2.544 0.10957 0.002000       0.002065
gamma[3]  3.580 0.09930 0.001813       0.001708
gamma[4]  4.544 0.08445 0.001542       0.001607
gamma[5]  5.630 0.11998 0.002191       0.002137
gamma[6]  6.627 0.12406 0.002265       0.002463
gamma[7]  7.606 0.11175 0.002040       0.002019
gamma[8]  8.608 0.18558 0.003388       0.003343

2. Quantiles for each variable:

           2.5%    25%    50%    75%  97.5%
beta.1    6.180  7.741  8.799  9.878 12.582
deviance 58.271 61.458 64.116 67.112 74.836
gamma[1]  1.363  1.489  1.548  1.604  1.706
gamma[2]  2.333  2.469  2.541  2.616  2.765
gamma[3]  3.386  3.512  3.582  3.647  3.779
gamma[4]  4.385  4.487  4.543  4.600  4.713
gamma[5]  5.406  5.545  5.630  5.713  5.870
gamma[6]  6.358  6.550  6.633  6.715  6.851
gamma[7]  7.392  7.529  7.603  7.682  7.821
gamma[8]  8.266  8.470  8.608  8.738  8.967

One should note, however, that, while the central value and sampling StD 
of beta are consistent with polr/glm results, the ratios SD/mean of the 
thresholds are much smaller than with polr/glm, and this is reflected in 
the credible intervals. Furthermore, the SDs do not seem to increase as 
much with the (absolute value) of the central values. (Yes, I am aware 
that centering my X values would reduce this phenomenon, but I kept them 
uncentered to illustrate my point).

For example, the 0.95 confidence interval of the C1|C2 threshold is about 
6.1 wide (about 41% of the mean) whereas the 0.95 credible interval of 
gamma[1] is (1.36 1.70), 0.34 wide, which is about 25% of its mean. For 
the C8|C9 thershold, the IC95/mean ratio is about 37% of the mean, while 
the CrI95/mean of gamma[8] is 8.1%.

Given that the two models are respectively a frequentist and a Bayesian 
of the very same probability model, this is perplexing.

Are there reasons to suspect that the SD reported by glm and/or polr are 
inflated (for example, these model do not seem to account for the re-use 
of the same data for the estimation of the various thresholds) ? Or is 
there a (maybe not so subtle) error in the Bayesian model interpretation 
in BUGS ?

Or am I barking up the wrong tree ?

Any idea would be welcome ...

Sincerely yours,

					Emmanuel Charpentier

The artificial data :
Data <-
structure(list(X.real = c(3.49764437862872, 6.90659633160501, 
2.49335390384994, 7.24293843863257, 0.803964727576124, 4.04213878877333, 
1.06214939381180, 0.554054259416088, 4.08729099177371, 2.52629057633986, 
3.24127380430347, 1.11891797458651, 0.530158326675324, 9.13663197362062, 
4.50887509479721, 8.28526961997477, 4.88346597634514, 7.1687905770683, 
8.30770739436844, 4.10970688126136, 0.926142793485848, 1.60153659527237, 
7.9980129347903, 7.09517767824342, 3.13154316634710, 3.73947230927693, 
2.80221854118421, 6.59984440492743, 2.61928305396566, 7.01599863310761, 
2.24968126253923, 1.58829945203618, 1.90902984485251, 5.12920046748922, 
9.2118070532461, 9.08208307216233, 4.43550132482185, 8.05729871351512, 
1.18297174176127, 8.46910981252808, 4.43676303030525, 6.4418454794069, 
4.53446029962874, 0.699769024165557, 8.46097165332483, 4.53799448354545, 
5.27431575051651, 7.54335240307173, 2.17309956012568, 5.88721264347949, 
9.4071993099453, 1.88089249201845, 5.3721556593878, 0.771888320064126, 
8.1072438324685, 6.31264339382069, 2.00734973388480, 1.55494366180340, 
7.02181982951623, 2.30599238507652, 0.527109325928194, 3.66541005952039, 
5.5545317236498, 1.09950173374848, 0.589535171670769, 3.53540683843517, 
4.45889129379049, 6.80170522713363, 7.37081649369139, 9.4889221652781, 
7.19282466718394, 6.13643707306052, 1.35088176999453, 3.16405426710045, 
3.32833977017296, 6.08074276163767, 1.63873091713528, 1.46557732912316, 
4.91770214161358, 0.938513984685997, 1.48740958479596, 2.01597792723018, 
4.21291073575364, 5.43887605123115, 4.70221174192061, 1.67668472528311, 
5.24846045233372, 9.0111473276549, 7.23671006856042, 3.72080675801418, 
4.39171014756293, 4.81283564404103, 2.41342165250038, 3.82321065255632, 
7.669782577978, 9.2234752201292, 0.641411897606775, 1.13470612170815, 
1.02966988829139, 1.24689684186557), Cat = structure(c(3L, 7L, 
2L, 7L, 1L, 4L, 1L, 1L, 4L, 3L, 3L, 1L, 1L, 9L, 5L, 8L, 5L, 7L, 
8L, 4L, 1L, 2L, 8L, 7L, 3L, 4L, 3L, 7L, 3L, 7L, 2L, 2L, 2L, 5L, 
9L, 9L, 4L, 8L, 1L, 8L, 4L, 6L, 5L, 1L, 8L, 5L, 5L, 8L, 2L, 6L, 
9L, 2L, 5L, 1L, 8L, 6L, 2L, 2L, 7L, 2L, 1L, 4L, 6L, 1L, 1L, 4L, 
4L, 7L, 7L, 9L, 7L, 6L, 1L, 3L, 3L, 6L, 2L, 1L, 5L, 1L, 1L, 2L, 
4L, 5L, 5L, 2L, 5L, 9L, 7L, 4L, 4L, 5L, 2L, 4L, 8L, 9L, 1L, 1L, 
1L, 1L), .Label = c("C1", "C2", "C3", "C4", "C5", "C6", "C7", 
"C8", "C9"), class = c("ordered", "factor")), X = c(3.78512255820666, 
7.03710649735623, 2.37579547880536, 7.01687069036889, 0.572774665466921, 
4.53333128338774, 0.92564311992818, 0.682074690286214, 4.17197303371935, 
2.62951212025962, 3.44249595134830, 1.02688214798057, 0.440959503725193, 
9.05785834195666, 4.52616407157106, 8.07448237191856, 4.95371782896727, 
7.35730414103575, 8.07638469674326, 4.26407495806574, 1.06385684713162, 
1.63333280184271, 7.9246178454581, 7.41541196525404, 3.12459209709868, 
3.64101167776984, 2.51917629411974, 6.8118629850261, 2.77786400733997, 
7.36566898280035, 2.03009770691198, 1.61723070290973, 1.96657910701987, 
5.22600151472831, 9.21386030948403, 9.18902347324618, 4.39493834305473, 
8.27266371630105, 1.15477068640133, 8.15280513569419, 4.6734050583675, 
6.51664088906166, 4.40955269905174, 0.54646886424704, 8.03605092676131, 
4.36779957410513, 5.44860695752003, 7.83429549417553, 1.84916430642593, 
5.81109382387844, 9.32133297147632, 1.73276033539495, 5.3490390105937, 
0.695781216499173, 8.05603266764859, 6.67741469458807, 2.03017543225991, 
1.60754713124494, 6.63481861913185, 2.20004584719421, 0.675994410762504, 
3.98953568167253, 5.45631429591027, 0.675175153754915, 0.305522085250977, 
3.58440905189674, 4.58557708778191, 6.99911313460102, 7.21102458968007, 
9.44650050070282, 7.33311596756764, 5.87485729872331, 1.11824133536435, 
3.40701638829918, 3.36818213997361, 5.92742193489156, 2.08701427718401, 
1.73709762230381, 4.71589786549998, 0.980248052739725, 1.58957599464140, 
2.17701413093404, 4.22999685414555, 5.84947271423299, 4.94396458250251, 
1.65133703600731, 5.17336780283038, 8.8993318875765, 7.29238547003421, 
3.62020801752199, 4.28550112286315, 4.95543088867053, 2.61555021108721, 
3.85711807608966, 7.67091954093986, 9.24609859359625, 0.436967673925767, 
1.32063426377087, 0.794911931890139, 1.16453130413352)), .Names = c
("X.real", 
"Cat", "X"), row.names = c(NA, -100L), class = "data.frame")

The R/BUGS Bayesian model :

system.time(Mod<-local({
  Mod<-function(){ ## BUGS model as an R function. Easy to manage.
    ## Thresholds.
    for (j in 1:ncut) {
      gamma.raw[j]~dnorm(0, 1.0E-4)
    }
    gamma<-sort(gamma.raw)
    ## p[i,j]=Pr(Cat[i]>j)
    ## q[i,j]=Pr(Cat[i]=j)
    ##       =Pr(Cat[i]>j-1)-Pr(Cat[i]>j)
    ##       =p[i,j-1]-p[i,j]
    for (i in 1:nobs) {
      Cat[i]~dcat(q[i,1:(ncut+1)])
      ## Predictions
      Cat.new[i]~dcat(q[i,1:(ncut+1)])
      q[i,1]<-1-p[i,1]
      for (j in 2:ncut) {
        q[i,j]<-p[i,j-1]-p[i,j]
      }
      q[i,ncut+1]<-p[i,ncut]
      for (j in 1:ncut) {
        logit(p[i,j])<-beta.1*(X[i]-gamma[j])
      }
    }
    beta.1~dnorm(0, 1.0E-4)
  }
  tmpf<-tempfile()
  write.model(Mod, tmpf) ## Auxilliary function writing a function body
  Mod.jags<-jags.model(tmpf,
                       data=with(Data, {
                         Y<-outer(Cat, levels(Cat)[-1], ">=")
                         list(X=X,
                              Cat=unclass(Cat),
                              nobs=nrow(Y),
                              ncut=ncol(Y))}),
                       ## "Reasonable" (not too overdispersed) initial
                       ## values are *CRUCIAL* to model initialisation
                       inits=function() {
                         list(gamma.raw=sort(runif(8,min=-1,max=1)),
                              beta.1=runif(1,min=0,max=1))
                       },
                       n.chains=3)
  update(Mod.jags, n.iter=10000)
  unlink(tmpf)
  Mod.coda<-coda.samples(Mod.jags,
                         variable.names=c("gamma",
                           ## "beta.0",
                           "beta.1",
                           ## "gamma.adj",
                           ## "lzeta",
                           ## "pi.sgn",
                           "deviance",
                           "Cat.new"),
                         thin=10,
                         n.iter=10000)
  list(Mod.jags=Mod.jags, Mod.coda=Mod.coda)
}))