[R] Follow-up about ordinal logit with mixtures: how about 'continuation ratio' strategy?
Paul Johnson
pauljohn32 at gmail.com
Thu May 10 07:23:08 CEST 2007
This is a follow up to the message I posted 3 days ago about how to
estimate mixed ordinal logit models. I hope you don't mind that I am
just pasting in the code and comments from an R file for your
feedback. Actual estimates are at the end of the post.
### Subject: mixed ordinal logit via "augmented" data setup.
### I've been interested in estimating an ordinal logit model with
### a random parameter. I asked in r-help about it. It appears to be
### a difficult problem because even well established commercial
### programs like SAS are prone to provide unreliable estimates.
### So far, I've found 3 avenues for research. 1) Go Bayesian and use
### MCMC to estimate the model. 2) Specify a likelihood function and
### then use R's optim function (as described in Laura A. Thompson,
### 2007, S-PLUS (and R) Manual to Accompany Agresti's Categorical
### Data Analysis (2002) 2nd edition). My guess is that either of
### those approaches would be worth the while, but I might have
### trouble persuading a target audience that they have good
### properties. 3) Adapt a "continuation ratio" approach.
### This latter approach was suggested by a post in r-help by Daniel
### Farewell <farewelld_at_Cardiff.ac.uk>
### http://tolstoy.newcastle.edu.au/R/help/06/08/32398.html#start
### It pointed me in the direction of "continuation ratio" logit models
### and one way to estimate an ordinal logit model with random
### parameters.
### Farewell's post gives working example code that shows a way to
### convert a K category ordinal variable into K-1 dichotomous
### indicators (a "continuation ratio" model). Those K-1 indicators
### can be "stacked" into one column and then a logistic regression
### program that is written for a two-valued output can be used.
### Farewell reasoned that one might then use a program for two-valued
### outputs including mixed effects. In his proposal, one would use
### the program lmer (package: lme4) ( a binomial family with a logit
### link) to estimate parameters for a dichotomous logit model with
### random parameters.
### This is the sort of magic trick I had suspected might be possible.
### Still, it is hard to believe it would work. But in the r-help
### response to the post by Farewell, there is no general objection
### against his modeling strategy.
### I had not studied "continuation ratio" logit models before, so I
### looked up a few articles on estimation of ordinal models by
### re-coding the output as a sequence of binary comparisons (stop
### ratios, continuation ratios, etc). The article that is most clear
### on how this can be done to estimate a proportional odds logistic
### model is
### Stephen R. Cole, Paul D. Allison, and Cande V. Ananth,
### Estimation of Cumulative Odds Ratios
### Ann Epidemiol 2004;14:172–178.
### They claim that one can recode an n-chotomy into n-1 dichotomous
### indicators. Each observation in the original dataset begets n-1
### lines in the augmented version. After creating the dichotomous
### indicator, one uses an ordinary dichotomous logit model to
### estimate parameters and cutpoints for an ordinal logit
### model. Cole, et al., are very clear.
### There is an additional benefit to the augmented data approach.
### One can explicitly test the proportional odds assumption by checking
### for interactions between the included independent variables and the
### "level" of the dependent variable being considered. The Cole
### article shows some examples where the proportional assumption appears
### to be violated.
### To test it, I created the following example. This shows the
### results of maximum likelihood estimation with the programs "polr"
### (package:MASS) and "lrm" (package: Design). The estimates from
### the augmented data approach are not exactly the same as polr or
### lrm, but they are close. It appears to me the claims about the
### augmented data approach are mostly correct. The parameter
### estimates are pretty close to the true values, while the estimates
### of the ordinal cutpoints are a bit difficult to interpret.
### I don't know what to make of the model diagnostics for the augmented
### data model. Should I have confidence in the standard errors?
#### How to interpret the degrees of freedom when 3 lines
### of data are manufactured from 1 observation? Are likelihood-ratio
### (anova) tests valid in this context? Are these estimates from the
### augmented data "equivalent to maximum likelihood"? What does it
### mean that the t-ratios are so different? That seems to be prima-facie
### evidence that the estimates based on the augmented data set are not
### trustworthy.
### Suppose I convince myself that the estimates of the ordinal model
### are "as good as" maximum likelihood. Is it reasonable to take the
### next step, and follow Farewell's idea of using this kind of model
### to estimate a mixture model? There are K-1 lines per case
### in the augmented data set. Suppose the observations were grouped
### into M sets and one hypothesized a random parameter across those M
### groups. Will the lmer (or other mixed model for dichotomous
### logits) be affected by the multiple lines?
### I'll probably stew over this and add a mixture component in the
### example below and find out how lmer does.
### File: ordLogit_Simulation-01.R
### Create a 4 category ordinal dependent variable Y valued 1,2,3,4
### with predictors "x" and "afactor". The "True" model has the
### linear predictor
### eta = 0.3 * x + 0 (afactor==1)+ 0.4 * (afactor==2) + 0.5
(afactor==3)+ 0.9 (afactor==4)
### and the "cutpoints" between categories are -2.3, -0.5, and 0.66.
N <- 1000
A <- -1
B <- 0.3
cutpoint <- c( -2.3, -0.5, 0.66 )
afactor <- gl (n=4, k = N/4)
afactorImpacts <- c( 0, 0.4, .5, .9 )
### Make "first" impact 0 so it does not confuse our study of
intercept & cutpoints.
### Any other value gets up getting added into the intercept and/or
first cutpoint.
set.seed(333)
x <- 1 + 10 * rnorm(N)
eta <- B * x + afactorImpacts[as.numeric(afactor)]
rand <- rlogis (N)
input <- eta + rand
y <- numeric(N)
dat <- data.frame(respno = 1:N, x, afactor, eta, rand, input, y )
#set y values
dat$y[ dat$input <= cutpoint[1] ] <- 1
dat$y[ cutpoint[1] < dat$input & dat$input <= cutpoint[2] ] <- 2
dat$y[ cutpoint[2] < dat$input & dat$input <= cutpoint[3] ] <- 3
dat$y[ cutpoint[3] <= dat$input ] <- 4
#convert to factor for analysis
dat$y <- factor(dat$y, ordered=T)
### Here's the "usual" proportional odds logistic regression
summary(polr(y ~ x + afactor, data=dat, Hess=T))
### Create a new augmented data frame as recommended by
### STEPHEN R. COLE, PAUL D. ALLISON, AND CANDE V. ANANTH,
### Estimation of Cumulative Odds Ratios
###
### Ann Epidemiol 2004;14:172–178.
### This new data frame has 3 lines for each original case,
### and a new dichotomous "response" variable called D
augmentedDat <- NULL
newdf <- dat
for (i in 2: length(levels(dat$y))) {
D <- ifelse ( dat$y >= i , 1, 0)
newone <- data.frame(newdf, i, D)
augmentedDat <- rbind(augmentedDat, newone)
}
### Compare previous POLR output to this model
summary(glm(D~ -1 + factor(i) + x + afactor , data=augmentedDat,
family=binomial))
### Well, the estimates for "x" and "afactor" are consistent with the
POLR output,
### but the estimates of the cutpoints come out with the unexpected
signs. No big deal,
### but I will have to figure it out.
### Might as well see what lrm says.
library(Design)
lrm (y ~ x + afactor, data=dat)
### Interesting. The cutpoint estimates come out with the "wrong"
sign, same as the
### augmented df estimates.
lrm( D ~ -1 + factor(i) + x + afactor, data=augmentedDat)
### Hmm. The estimates of factor(i) are, at first, baffling. It appears that
### the estimates for i=3 and i=4 are "delta" estimates--changes
against the intercept.
### I mean, the Intercept is an estimate of cutpoint[3] and the the estimate of
### cutpoint[2] = intercept + i=3
### and estimate of cutpoint[1] = intercept + i=2.
### Oh, and then reverse the signs.
===============================================================
In case you do not want to run this for yourself, here are the results
of the 4 regressions.
> summary(polr(y ~ x + afactor, data=dat, Hess=T))
Call:
polr(formula = y ~ x + afactor, data = dat, Hess = T)
Coefficients:
Value Std. Error t value
x 0.3194385 0.01534613 20.815566
afactor2 0.4829818 0.21012233 2.298574
afactor3 0.7186079 0.21405603 3.357102
afactor4 1.2058896 0.21511922 5.605681
Intercepts:
Value Std. Error t value
1|2 -2.2643 0.1782 -12.7059
2|3 -0.5304 0.1587 -3.3434
3|4 0.7312 0.1582 4.6229
Residual Deviance: 1502.426
AIC: 1516.426
> summary(glm(D~ -1 + factor(i) + x + afactor , data=augmentedDat, family=binomial))
Call:
glm(formula = D ~ -1 + factor(i) + x + afactor, family = binomial,
data = augmentedDat)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.1563 -0.3113 0.1191 0.4392 2.9441
Coefficients:
Estimate Std. Error z value Pr(>|z|)
factor(i)2 2.29647 0.16395 14.007 < 2e-16 ***
factor(i)3 0.55764 0.13968 3.992 6.55e-05 ***
factor(i)4 -0.69938 0.13918 -5.025 5.03e-07 ***
x 0.31961 0.01265 25.256 < 2e-16 ***
afactor2 0.40112 0.16424 2.442 0.0146 *
afactor3 0.66854 0.16712 4.000 6.33e-05 ***
afactor4 1.16989 0.16995 6.884 5.82e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 4158.9 on 3000 degrees of freedom
Residual deviance: 1837.1 on 2993 degrees of freedom
AIC: 1851.1
Number of Fisher Scoring iterations: 6
> lrm (y ~ x + afactor, data=dat)
Logistic Regression Model
lrm(formula = y ~ x + afactor, data = dat)
Frequencies of Responses
1 2 3 4
184 155 138 523
Obs Max Deriv Model L.R. d.f. P C Dxy
1000 6e-11 923.08 4 0 0.897 0.793
Gamma Tau-a R2 Brier
0.794 0.516 0.661 0.077
Coef S.E. Wald Z P
y>=2 2.2643 0.17821 12.71 0.0000
y>=3 0.5304 0.15865 3.34 0.0008
y>=4 -0.7312 0.15817 -4.62 0.0000
x 0.3194 0.01535 20.82 0.0000
afactor=2 0.4830 0.21012 2.30 0.0215
afactor=3 0.7186 0.21406 3.36 0.0008
afactor=4 1.2059 0.21512 5.61 0.0000
> lrm( D ~ -1 + factor(i) + x + afactor, data=augmentedDat)
Logistic Regression Model
lrm(formula = D ~ -1 + factor(i) + x + afactor, data = augmentedDat)
Frequencies of Responses
0 1
1000 2000
Obs Max Deriv Model L.R. d.f. P C Dxy
3000 2e-12 1981.99 6 0 0.933 0.867
Gamma Tau-a R2 Brier
0.868 0.385 0.671 0.097
Coef S.E. Wald Z P
Intercept 2.2965 0.16396 14.01 0.0000
i=3 -1.7388 0.16135 -10.78 0.0000
i=4 -2.9959 0.17392 -17.23 0.0000
x 0.3196 0.01266 25.25 0.0000
afactor=2 0.4011 0.16424 2.44 0.0146
afactor=3 0.6685 0.16713 4.00 0.0001
afactor=4 1.1699 0.16995 6.88 0.0000
--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas
More information about the R-help
mailing list