[R] Follow-up about ordinal logit with mixtures: how about 'continuation ratio' strategy?

Paul Johnson pauljohn32 at gmail.com
Thu May 10 07:23:08 CEST 2007


This is a follow up to the message I posted 3 days ago about how to
estimate mixed ordinal logit models.  I hope you don't mind that I am
just pasting in the code and comments from an R file for your
feedback.  Actual estimates are at the end of the post.

### Subject: mixed ordinal logit via "augmented" data setup.

### I've been interested in estimating an ordinal logit model with
### a random parameter.  I asked in r-help about it. It appears to be
### a difficult problem because even well established commercial
### programs like SAS are prone to provide unreliable estimates.

### So far, I've found 3 avenues for research.  1) Go Bayesian and use
### MCMC to estimate the model.  2) Specify a likelihood function and
### then use R's optim function (as described in Laura A. Thompson,
### 2007, S-PLUS (and R) Manual to Accompany Agresti's Categorical
### Data Analysis (2002) 2nd edition).  My guess is that either of
### those approaches would be worth the while, but I might have
### trouble persuading a target audience that they have good
### properties.  3) Adapt a "continuation ratio" approach.

### This latter approach was suggested by a post in r-help by Daniel
### Farewell <farewelld_at_Cardiff.ac.uk>
### http://tolstoy.newcastle.edu.au/R/help/06/08/32398.html#start
### It pointed me in the direction of "continuation ratio" logit models
### and one way to estimate an ordinal logit model with random
### parameters.

### Farewell's post gives working example code that shows a way to
### convert a K category ordinal variable into K-1 dichotomous
### indicators (a "continuation ratio" model). Those K-1 indicators
### can be "stacked" into one column and then a logistic regression
### program that is written for a two-valued output can be used.
### Farewell reasoned that one might then use a program for two-valued
### outputs including mixed effects.  In his proposal, one would use
### the program lmer (package: lme4) ( a binomial family with a logit
### link) to estimate parameters for a dichotomous logit model with
### random parameters.

### This is the sort of magic trick I had suspected might be possible.
### Still, it is hard to believe it would work.  But in the r-help
### response to the post by Farewell, there is no general objection
### against his modeling strategy.

### I had not studied "continuation ratio" logit models before, so I
### looked up a few articles on estimation of ordinal models by
### re-coding the output as a sequence of binary comparisons (stop
### ratios, continuation ratios, etc).  The article that is most clear
### on how this can be done to estimate a proportional odds logistic
### model is

### Stephen R. Cole, Paul D. Allison, and Cande V. Ananth,
### Estimation of Cumulative Odds Ratios
### Ann Epidemiol 2004;14:172–178.

### They claim that one can recode an n-chotomy into n-1 dichotomous
### indicators.  Each observation in the original dataset begets n-1
### lines in the augmented version.  After creating the dichotomous
### indicator, one uses an ordinary dichotomous logit model to
### estimate parameters and cutpoints for an ordinal logit
### model. Cole, et al., are very clear.

### There is an additional benefit to the augmented data approach.
### One can explicitly test the proportional odds assumption by checking
### for interactions between the included independent variables and the
### "level" of the dependent variable being considered.  The Cole
### article shows some examples where the proportional assumption appears
### to be violated.

### To test it, I created the following example.  This shows the
### results of maximum likelihood estimation with the programs "polr"
### (package:MASS) and "lrm" (package: Design).  The estimates from
### the augmented data approach are not exactly the same as polr or
### lrm, but they are close.  It appears to me the claims about the
### augmented data approach are mostly correct.  The parameter
### estimates are pretty close to the true values, while the estimates
### of the ordinal cutpoints are a bit difficult to interpret.

### I don't know what to make of the model diagnostics for the augmented
### data model. Should I have confidence in the standard errors?
#### How to interpret the degrees of freedom when 3 lines
### of data are manufactured from 1 observation?  Are likelihood-ratio
### (anova) tests valid in this context?  Are these estimates from the
### augmented data "equivalent to maximum likelihood"?  What does it
### mean that the t-ratios are so different?  That seems to be prima-facie
### evidence that the estimates based on the augmented data set are not
### trustworthy.

### Suppose I convince myself that the estimates of the ordinal model
### are "as good as" maximum likelihood.  Is it reasonable to take the
### next step, and follow Farewell's idea of using this kind of model
### to estimate a mixture model?  There are K-1 lines per case
### in the augmented data set. Suppose the observations were grouped
### into M sets and one hypothesized a random parameter across those M
### groups.  Will the lmer (or other mixed model for dichotomous
### logits) be affected by the multiple lines?

### I'll probably stew over this and add a mixture component in the
### example below and find out how lmer does.


### File: ordLogit_Simulation-01.R
### Create a 4 category ordinal dependent variable Y valued 1,2,3,4
### with predictors "x" and "afactor".  The "True" model has the
### linear predictor

### eta = 0.3 * x + 0 (afactor==1)+ 0.4 * (afactor==2) + 0.5
(afactor==3)+ 0.9 (afactor==4)
### and the "cutpoints" between categories are -2.3, -0.5, and 0.66.


N <- 1000
A <- -1
B <- 0.3
cutpoint <- c( -2.3, -0.5, 0.66 )


afactor <- gl (n=4, k = N/4)
afactorImpacts <- c( 0, 0.4, .5, .9 )

### Make "first" impact 0 so it does not confuse our study of
intercept & cutpoints.
### Any other value gets up getting added into the intercept and/or
first cutpoint.

set.seed(333)
x <- 1 + 10 * rnorm(N)
eta <- B * x  + afactorImpacts[as.numeric(afactor)]
rand <- rlogis (N)
input <- eta + rand

y <- numeric(N)

dat <- data.frame(respno = 1:N, x, afactor,  eta, rand, input,  y )
#set y values
dat$y[ dat$input <= cutpoint[1] ] <- 1
dat$y[ cutpoint[1] < dat$input & dat$input <= cutpoint[2] ] <- 2
dat$y[ cutpoint[2] < dat$input & dat$input <= cutpoint[3] ] <- 3
dat$y[ cutpoint[3] <= dat$input ] <- 4
#convert to factor for analysis
dat$y <- factor(dat$y, ordered=T)

### Here's the "usual" proportional odds logistic regression
summary(polr(y ~ x + afactor, data=dat, Hess=T))

### Create a new augmented data frame as recommended by
### STEPHEN R. COLE, PAUL D. ALLISON, AND CANDE V. ANANTH,
### Estimation of Cumulative Odds Ratios
###
### Ann Epidemiol 2004;14:172–178.
### This new data frame has 3 lines for each original case,
### and a new dichotomous "response" variable called D

augmentedDat <- NULL
newdf <- dat

for (i in 2: length(levels(dat$y))) {
  D <- ifelse ( dat$y >= i , 1, 0)
  newone <- data.frame(newdf, i, D)
  augmentedDat <- rbind(augmentedDat, newone)
}

### Compare previous POLR output to this model
summary(glm(D~ -1 + factor(i) + x + afactor , data=augmentedDat,
family=binomial))
### Well, the estimates for "x" and "afactor" are consistent with the
POLR output,
### but the estimates of the cutpoints come out with the unexpected
signs.  No big deal,
### but I will have to figure it out.


### Might as well see what lrm says.
library(Design)

lrm (y ~ x + afactor, data=dat)
### Interesting. The cutpoint estimates come out with the "wrong"
sign, same as the
### augmented df estimates.



lrm( D ~ -1 + factor(i) + x + afactor, data=augmentedDat)
### Hmm. The estimates of factor(i) are, at first, baffling. It appears that
### the estimates for i=3 and i=4 are "delta" estimates--changes
against the intercept.
### I mean, the Intercept is an estimate of cutpoint[3] and the the estimate of
### cutpoint[2] = intercept + i=3
### and estimate of cutpoint[1] = intercept + i=2.
### Oh, and then reverse the signs.

===============================================================

In case you do not want to run this for yourself, here are the results
of the 4 regressions.


> summary(polr(y ~ x + afactor, data=dat, Hess=T))
Call:
polr(formula = y ~ x + afactor, data = dat, Hess = T)

Coefficients:
             Value Std. Error   t value
x        0.3194385 0.01534613 20.815566
afactor2 0.4829818 0.21012233  2.298574
afactor3 0.7186079 0.21405603  3.357102
afactor4 1.2058896 0.21511922  5.605681

Intercepts:
    Value    Std. Error t value
1|2  -2.2643   0.1782   -12.7059
2|3  -0.5304   0.1587    -3.3434
3|4   0.7312   0.1582     4.6229

Residual Deviance: 1502.426
AIC: 1516.426



> summary(glm(D~ -1 + factor(i) + x + afactor , data=augmentedDat, family=binomial))

Call:
glm(formula = D ~ -1 + factor(i) + x + afactor, family = binomial,
    data = augmentedDat)

Deviance Residuals:
    Min       1Q   Median       3Q      Max
-3.1563  -0.3113   0.1191   0.4392   2.9441

Coefficients:
           Estimate Std. Error z value Pr(>|z|)
factor(i)2  2.29647    0.16395  14.007  < 2e-16 ***
factor(i)3  0.55764    0.13968   3.992 6.55e-05 ***
factor(i)4 -0.69938    0.13918  -5.025 5.03e-07 ***
x           0.31961    0.01265  25.256  < 2e-16 ***
afactor2    0.40112    0.16424   2.442   0.0146 *
afactor3    0.66854    0.16712   4.000 6.33e-05 ***
afactor4    1.16989    0.16995   6.884 5.82e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4158.9  on 3000  degrees of freedom
Residual deviance: 1837.1  on 2993  degrees of freedom
AIC: 1851.1

Number of Fisher Scoring iterations: 6





> lrm (y ~ x + afactor, data=dat)

Logistic Regression Model

lrm(formula = y ~ x + afactor, data = dat)


Frequencies of Responses
  1   2   3   4
184 155 138 523

       Obs  Max Deriv Model L.R.       d.f.          P          C        Dxy
      1000      6e-11     923.08          4          0      0.897      0.793
     Gamma      Tau-a         R2      Brier
     0.794      0.516      0.661      0.077

          Coef    S.E.    Wald Z P
y>=2       2.2643 0.17821 12.71  0.0000
y>=3       0.5304 0.15865  3.34  0.0008
y>=4      -0.7312 0.15817 -4.62  0.0000
x          0.3194 0.01535 20.82  0.0000
afactor=2  0.4830 0.21012  2.30  0.0215
afactor=3  0.7186 0.21406  3.36  0.0008
afactor=4  1.2059 0.21512  5.61  0.0000



> lrm( D ~ -1 + factor(i) + x + afactor, data=augmentedDat)

Logistic Regression Model

lrm(formula = D ~ -1 + factor(i) + x + afactor, data = augmentedDat)


Frequencies of Responses
   0    1
1000 2000

       Obs  Max Deriv Model L.R.       d.f.          P          C        Dxy
      3000      2e-12    1981.99          6          0      0.933      0.867
     Gamma      Tau-a         R2      Brier
     0.868      0.385      0.671      0.097

          Coef    S.E.    Wald Z P
Intercept  2.2965 0.16396  14.01 0.0000
i=3       -1.7388 0.16135 -10.78 0.0000
i=4       -2.9959 0.17392 -17.23 0.0000
x          0.3196 0.01266  25.25 0.0000
afactor=2  0.4011 0.16424   2.44 0.0146
afactor=3  0.6685 0.16713   4.00 0.0001
afactor=4  1.1699 0.16995   6.88 0.0000


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas



More information about the R-help mailing list