[R] multinomial conditional logit models
John Hendrickx
john_hendrickx at yahoo.com
Wed Jan 29 21:01:03 CET 2003
A multinomial logit model can be specified as a conditional logit
model after restructuring the data. Doing so gives flexibility in
imposing restrictions on the dependent variable. One application is
to specify a loglinear model for square tables, e.g. quasi-symmetry
or quasi-independence, as a multinomial logit model with covariates.
Further details on this technique and examples with several packages
are on my homepage at http://www.xs4all.nl/~jhckx/mcl/
I've been trying to implement an MCL model in R and have been mostly
succesful. However I'm still stuck on a few points and I'm hoping
someone can point out how to do fix them.
To estimate an MCL model, the data must be restructured into a
person-choice file.
* Each case must be duplicated "ncat" times ("ncat" is the number of
categories of the dependent variable)
* Each case must be identified by a strata variable (id)
* Each duplicate must be identified by a variable indexing the
choices (newy)
* A dichotomous variable must indicate which duplicate corresponds
with the respondent's choice (didep)
I've done this as follows:
mclgen <- function (datamat,catvar) {
ncat <- nlevels(catvar)
id<-1:length(catvar)
datamat<-cbind(id,datamat)
perschoice <-NULL
for (i in 1:ncat) {
perschoice<-rbind(perschoice,datamat)
}
perschoice<-perschoice[sort.list(perschoice$id),]
perschoice$newy<-gl(ncat,1,length=nrow(perschoice))
dep<-parse(text=paste("perschoice$",substitute(catvar),sep=""))
perschoice$depvar<-as.numeric(eval(dep)==perschoice$newy)
perschoice
}
This works but I wonder if it's the most efficient method. I tried
using "rep" rather than "rbind" in a loop but that replicated the
dataset horizontally, not vertically. Is this the best solution?
I also finally figure out how to include the argument "catvar" in a
transformation involving another dataset but this solution seems very
complicated (eval+parse+substitute). Is there a simpler solution?
What I haven't figured out is how to use "catvar" once again but on
the righthand side of the transformation. In the second to last line,
I'd like to do "perschoice$catvar<-perschoice$newy". I've tried
several possibities but I can't figure out how to let R substitute
the value of "catvar" in that expression. "eval(dep)" doesn't work as
it does for the lefthand side but how should it be done?
My solution for now is to do it afterward. My program continues as
follows:
pc<-mclgen(logan,occ)
pc$occ<-factor(pc$newy)
library(survival)
cl.lr<-clogit(depvar~occ+occ:educ+occ:black+strata(id),data=pc)
summary(cl.lr)
The dataset "logan" is available from the website. "occ" is the
dependent variable, "educ" and "black" are independents. Once
"mclgen" has transformed the data into a person-choice file, a
multinomial logit model can be estimated using the dichotomous
dependent variable, the main effects of "occ" for the intercept term
and interactions of "educ" and "black" with "occ" for the effects of
these variables.
This works more or less. What's unexpected though is that the
reference category of "occ" isn't dropped in the interactions with
"educ" and "black". The highest category is dropped due to
collineairity but that's not quite what I want. Could someone explain
why this happens and how to let R drop the first category?
A final problem is that the results of clogit are only approximate (2
digits) whereas in other packages the cl model produces exactly the
same estimates as an mnl model. Is this because terms are dropped due
to collinearity or are there options I should examine. clogit issues
the following warnings which seem to be related to the collinearity
but perhaps point to something else:
Warning messages:
1: Loglik converged before variable 10 ; beta may be infinite. in:
fitter(X, Y, strats, offset, init, control, weights = weights,
2: X matrix deemed to be singular; variable 9 14 in: coxph(formula =
Surv(rep(1, 4190), depvar) ~ occ + occ:educ +
Apologies for the length of this post. Hopefully someone will have
time to read it through and help me out. As always, advance thanks
for any assistance.
John Hendrickx
More information about the R-help
mailing list