[R] Error when running Conditional Logit Model

Charles C. Berry cberry at tajo.ucsd.edu
Sat Dec 19 06:24:47 CET 2009


On Fri, 18 Dec 2009, Hien Nguyen wrote:

> Thanks a lot for answering my questions.
>
> I have tried to run the clogit for only 64 observations and 4 independent 
> variables and the results are solved instantly. However, when I run the same 
> command (with only 4 dependent variables) for the full data, it keeps running 
> for 50 minutes now. :(
>
> Thomas, what do you mean by "maximizing the unconditional likelihood is fine 
> when the stratum sizes are large"? What I put in "strata (__)" is actually 
> the possible choices (1-64). Each choices will be recored more than 4000 
> times (which means I have more than 4000 values of 1, 4000 values of 2 and so 
> on).
> Does it sound right?

So you have 64 cases and more than 250000 controls.

Large strata will really slow down clogit. But I think that that isn't 
your problem.

If the strata really matter - in the sense that the conditional 
distributions of covariates for controls vary a lot from stratum to 
stratum - then you really gain little by having more than a handful of 
controls for each case. If that is the situation you are in, sampling a 
couple of dozen controls from the stratum of each case will give you 
results that are very nearly as precise as those obtained from using all 
4000 of them:

 	plot( 1:100, (1 + 1/1:100), xlab='n of controls',
 		ylab='relative variance of coef' )


will give you rough idea of the impact of increasing the number of 
controls per case. The variance with 1 control per case is 2; at the 
asymptote it is 1.

So you can probably spend things up a lot by using fewer controls with 
little loss in accuracy.

With only 64 cases you cannot fit terribly complicated models. This holds 
whether you approach things conditionally using clogit or unconditionally 
using glm. Fourteen degrees of freedom for regression is probably pushing 
matters.  ridge() is helpful in taming overlarge regressor sets in clogit, 
but you'll need to use survival:::summary.coxph.penal() on the result (or 
tinker with the class attribute).

BTW, when you say 'strata(___)', I hope you mean that you use something 
like 'strata( stratvar )' where stravar is a factor that encodes the 
64 levels.

HTH,

Chuck

>
> Thanks a lot
>
> Hien
>
> tlumley at u.washington.edu wrote:
>>  On Fri, 18 Dec 2009, Hien Nguyen wrote:
>> 
>> >  Dear Drs Winsemius and Berry,
>> > 
>> >  Thanks a lot for your comment and suggestions on running my model. I am 
>> >  not just new to R but new to CLM as well. :( With your suggestions, I 
>> >  figure out that I have huge misunderstandings on the model and data 
>> >  arrangement.
>> > 
>> >  After my finals, I have read again related materials on CLM and 
>> >  rearranged in an appropriate way before running the model in R. This 
>> >  time, I have a data of more than 250,000 observations (created from more 
>> >  than 4000 response) and a model of 15 predictors.
>> > 
>> >  My question is that how long should it takes for the clogit command to 
>> >  run because it has been running for more 10 hours on a quad-core 
>> >  computer and still doesn't show any sign of done or almost done. Is it 
>> >  OK or my command just does not work.
>>
>>  If you have a lot of records with case=1 in a stratum, conditional
>>  logistic regression will be extremely slow.   And unnecessary: maximizing
>>  the unconditional likelihood is fine when the stratum sizes are large.
>>
>>  Note that a quad-core computer won't help. Only one core will be used in
>>  the computations.
>>
>>       -thomas
>> 
>> 
>> 
>> 
>> >  Thanks a lot for your response
>> > 
>> >  Hien
>> > 
>> > 
>> >  Charles C. Berry wrote:
>> > >  On Fri, 4 Dec 2009, David Winsemius wrote:
>> > > 
>> > > > 
>> > > >  On Dec 4, 2009, at 5:49 PM, Hien Nguyen wrote:
>> > > > 
>> > > > >  Dear Dr. Winsemius,
>> > > > > 
>> > > > >  Thank you very much for your reply.
>> > > > > 
>> > > > >  I have tried many possible combinations (even with the model of 
>> > > > >  only 2 predictors) but it produces the same message. With more 
>> > > > >  than 4000 observations, I think 14 predictors might not be too 
>> > > > >  many.
>> > > > 
>> > > >  It is what happens in the factor combinations that concern me. I am 
>> > > >  guessing that some of those predictors are factors. You really 
>> > > >  should not ask r-help questions without providing better 
>> > > >  descriptions of both the outcomes and the predictor variables.
>> > > > 
>> > > > > 
>> > > > >  Although my dependent variable (Pin) is not discrete  (it ranges 
>> > > > >  from 0 to 1), I do not think it will create problems to the 
>> > > > >  estimation but I'm not sure
>> > > > 
>> > > >  I would think it _would_ cause problems. As I understand it, 
>> > > >  conditional methods create contingency tables. Why are you using an 
>> > > >  outcome type that is not consistent with the fundamental regression 
>> > > >  assumptions of the clogit function?
>> > > > 
>> > > >  I do not get that particular error when I munge the infert dataset 
>> > > >  to have case be a random uniform value, but I do get an error.
>> > > > >   infert$case <- runif(nrow(infert))
>> > > > >   clogit(case~spontaneous+induced+strata(stratum),data=infert)
>> > > >  Error in Surv(rep(1, 248L), case) : Invalid status value
>> > > > 
>> > > 
>> > >  David, I think you were on the right track. I get this:
>> > > 
>> > >  -----------
>> > > >  clogit(I(case*runif(length(case)))~spontaneous+induced+strata(ifelse(stratum>40,NA,stratum)),data=infert) 
>> > > 
>> > >  Error in fitter(X, Y, strats, offset, init, control, weights = 
>> > >  weights,  :
>> > >    NA/NaN/Inf in foreign function call (arg 6)
>> > >  In addition: Warning messages:
>> > >  1: In Surv(rep(1, 248L), I(case * runif(length(case)))) :
>> > >    Invalid status value, converted to NA
>> > >  2: In fitter(X, Y, strats, offset, init, control, weights = weights, 
>> > >  :
>> > >    Ran out of iterations and did not converge
>> > > > 
>> > >  ------------
>> > > 
>> > >  which looks pretty much the same as Hien's error msg
>> > > 
>> > >  So Hien needs to create a logical status value.
>> > > 
>> > >  Chuck
>> > > 
>> > >  p.s.
>> > > 
>> > > >  sessionInfo()
>> > >  R version 2.10.0 (2009-10-26)
>> > >  i386-pc-mingw32
>> > > 
>> > >  locale:
>> > >  [1] LC_COLLATE=English_United States.1252
>> > >  [2] LC_CTYPE=English_United States.1252
>> > >  [3] LC_MONETARY=English_United States.1252
>> > >  [4] LC_NUMERIC=C
>> > >  [5] LC_TIME=English_United States.1252
>> > > 
>> > >  attached base packages:
>> > >  [1] splines   stats     graphics  grDevices utils     datasets 
>> > >  methods
>> > >  [8] base
>> > > 
>> > >  other attached packages:
>> > >  [1] survival_2.35-7
>> > > 
>> > >  loaded via a namespace (and not attached):
>> > >  [1] tools_2.10.0
>> > > > 
>> > > 
>> > > 
>> > > >  So I certainly would not have proceeded to submit a full analysis to 
>> > > >  clogit if I could not get a test case to run under the situation you 
>> > > >  propose.
>> > > > 
>> > > >  -- 
>> > > >  David
>> > > > 
>> > > > > 
>> > > > >  I have checked the collinearity among predictors and they are all 
>> > > > >  < 0.5 (which I think is OK). Do you know what else could make this 
>> > > > >  errors?
>> > > > > 
>> > > > >  Thanks a lot
>> > > > > 
>> > > > >  Hien Nguyen
>> > > > > 
>> > > > >  David Winsemius wrote:
>> > > > > > >  On Dec 4, 2009, at 9:22 AM, Hien Nguyen wrote:
>> > > > > > > >  Dear R-helpers,
>> > > > > > > > >  I am very new to R and trying to run the conditional logit 
>> > > > >  model using
>> > > > > > >  "clogit " command.
>> > > > > > >  I have more than 4000 observations in my dataset and try to 
>> > > > >  predict the
>> > > > > > >  dependent variable from 14 independent variables. My command 
>> > > > >  is as > > follows
>> > > > > > > > >  clmtest1 <-
>> > > > > > > 
>> > > > >  clogit(Pin~Income+Bus+Pop+Urbpro+Health+Student+Grad+NE+NW+NCC+SCC+CH+SE+MRD+strata(IDD),data=clmdata) 
>> > > > > > > > > > >  However, it produces the following errors:
>> > > > > > > > >  Error in fitter(X, Y, strats, offset, init, control, 
>> > > > >  weights = weights, > > :
>> > > > > > >  NA/NaN/Inf in foreign function call (arg 6)
>> > > > > > >  In addition: Warning messages:
>> > > > > > >  1: In Surv(rep(1, 4096L), Pinmig) : Invalid status value, 
>> > > > >  converted to > > NA
>> > > > > > >  2: In fitter(X, Y, strats, offset, init, control, weights = 
>> > > > >  weights, :
>> > > > > > >  Ran out of iterations and did not converge
>> > > > > > > > >  I search the error message from R forums but it does not 
>> > > > >  say anything
>> > > > > > >  for Conditional Logit Model.
>> > > > > > >  With that many predictors in a small dataset, you may have 
>> > > > >  created matrix > singularities. Perhaps you created a stratum 
>> > > > >  where all of the subjects > experience the event and others where 
>> > > > >  none did so. The coefficients might > be driven to infinities. Try 
>> > > > >  simplifying the model.
>> > > > > > > > > > >  Please check for me what it says and what should I do 
>> > > > >  to solve it.
>> > > > > > > 
>> > > > 
>> > > >  David Winsemius, MD
>> > > >  Heritage Laboratories
>> > > >  West Hartford, CT
>> > > > 
>> > > >  ______________________________________________
>> > > >  R-help at r-project.org mailing list
>> > > >  https://stat.ethz.ch/mailman/listinfo/r-help
>> > > >  PLEASE do read the posting guide 
>> > > >  http://www.R-project.org/posting-guide.html
>> > > >  and provide commented, minimal, self-contained, reproducible code.
>> > > > 
>> > > 
>> > >  Charles C. Berry                            (858) 534-2098
>> > >                                              Dept of Family/Preventive 
>> > >  Medicine
>> > >  E mailto:cberry at tajo.ucsd.edu                UC San Diego
>> > >  http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 
>> > >  92093-0901
>> > > 
>> > > 
>> > 
>> >  ______________________________________________
>> >  R-help at r-project.org mailing list
>> >  https://stat.ethz.ch/mailman/listinfo/r-help
>> >  PLEASE do read the posting guide 
>> >  http://www.R-project.org/posting-guide.html
>> >  and provide commented, minimal, self-contained, reproducible code.
>> > 
>>
>>  Thomas Lumley            Assoc. Professor, Biostatistics
>>  tlumley at u.washington.edu    University of Washington, Seattle
>> 
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901




More information about the R-help mailing list