[R] Error when running Conditional Logit Model
Hien Nguyen
hunghien2 at gmail.com
Sat Dec 19 11:36:31 CET 2009
On 12/18/09 22:24, Charles C. Berry wrote:
> On Fri, 18 Dec 2009, Hien Nguyen wrote:
>
>> Thanks a lot for answering my questions.
>>
>> I have tried to run the clogit for only 64 observations and 4
>> independent variables and the results are solved instantly. However,
>> when I run the same command (with only 4 dependent variables) for the
>> full data, it keeps running for 50 minutes now. :(
>>
>> Thomas, what do you mean by "maximizing the unconditional likelihood
>> is fine when the stratum sizes are large"? What I put in "strata
>> (__)" is actually the possible choices (1-64). Each choices will be
>> recored more than 4000 times (which means I have more than 4000
>> values of 1, 4000 values of 2 and so on).
>> Does it sound right?
>
> So you have 64 cases and more than 250000 controls.
>
No, I have 4096 cases and more than 25000 controls. Each case will
result in 63 controls (which I have to create from each case)
> Large strata will really slow down clogit. But I think that that isn't
> your problem.
>
> If the strata really matter - in the sense that the conditional
> distributions of covariates for controls vary a lot from stratum to
> stratum - then you really gain little by having more than a handful of
> controls for each case. If that is the situation you are in, sampling
> a couple of dozen controls from the stratum of each case will give you
> results that are very nearly as precise as those obtained from using
> all 4000 of them:
>
> plot( 1:100, (1 + 1/1:100), xlab='n of controls',
> ylab='relative variance of coef' )
>
>
> will give you rough idea of the impact of increasing the number of
> controls per case. The variance with 1 control per case is 2; at the
> asymptote it is 1.
>
> So you can probably spend things up a lot by using fewer controls with
> little loss in accuracy.
I think I might need to use this.
>
> With only 64 cases you cannot fit terribly complicated models. This
> holds whether you approach things conditionally using clogit or
> unconditionally using glm. Fourteen degrees of freedom for regression
> is probably pushing matters. ridge() is helpful in taming overlarge
> regressor sets in clogit, but you'll need to use
> survival:::summary.coxph.penal() on the result (or tinker with the
> class attribute).
>
I still let the program run. For the case of 4 df, it still does not
produce the result.
> BTW, when you say 'strata(___)', I hope you mean that you use
> something like 'strata( stratvar )' where stravar is a factor that
> encodes the 64 levels.
>
Yes, that's what I mean. Thank you.
> HTH,
>
> Chuck
>
>>
>> Thanks a lot
>>
>> Hien
>>
>> tlumley at u.washington.edu wrote:
>>> On Fri, 18 Dec 2009, Hien Nguyen wrote:
>>>
>>> > Dear Drs Winsemius and Berry,
>>> > > Thanks a lot for your comment and suggestions on running my
>>> model. I am > not just new to R but new to CLM as well. :( With
>>> your suggestions, I > figure out that I have huge misunderstandings
>>> on the model and data > arrangement.
>>> > > After my finals, I have read again related materials on CLM and
>>> > rearranged in an appropriate way before running the model in R.
>>> This > time, I have a data of more than 250,000 observations
>>> (created from more > than 4000 response) and a model of 15 predictors.
>>> > > My question is that how long should it takes for the clogit
>>> command to > run because it has been running for more 10 hours on a
>>> quad-core > computer and still doesn't show any sign of done or
>>> almost done. Is it > OK or my command just does not work.
>>>
>>> If you have a lot of records with case=1 in a stratum, conditional
>>> logistic regression will be extremely slow. And unnecessary:
>>> maximizing
>>> the unconditional likelihood is fine when the stratum sizes are large.
>>>
>>> Note that a quad-core computer won't help. Only one core will be
>>> used in
>>> the computations.
>>>
>>> -thomas
>>>
>>>
>>>
>>>
>>> > Thanks a lot for your response
>>> > > Hien
>>> > > > Charles C. Berry wrote:
>>> > > On Fri, 4 Dec 2009, David Winsemius wrote:
>>> > > > > > > > > On Dec 4, 2009, at 5:49 PM, Hien Nguyen wrote:
>>> > > > > > > > Dear Dr. Winsemius,
>>> > > > > > > > > Thank you very much for your reply.
>>> > > > > > > > > I have tried many possible combinations (even with
>>> the model of > > > > only 2 predictors) but it produces the same
>>> message. With more > > > > than 4000 observations, I think 14
>>> predictors might not be too > > > > many.
>>> > > > > > > It is what happens in the factor combinations that
>>> concern me. I am > > > guessing that some of those predictors are
>>> factors. You really > > > should not ask r-help questions without
>>> providing better > > > descriptions of both the outcomes and the
>>> predictor variables.
>>> > > > > > > > > > > > Although my dependent variable (Pin) is not
>>> discrete (it ranges > > > > from 0 to 1), I do not think it will
>>> create problems to the > > > > estimation but I'm not sure
>>> > > > > > > I would think it _would_ cause problems. As I
>>> understand it, > > > conditional methods create contingency tables.
>>> Why are you using an > > > outcome type that is not consistent with
>>> the fundamental regression > > > assumptions of the clogit function?
>>> > > > > > > I do not get that particular error when I munge the
>>> infert dataset > > > to have case be a random uniform value, but I
>>> do get an error.
>>> > > > > infert$case <- runif(nrow(infert))
>>> > > > > clogit(case~spontaneous+induced+strata(stratum),data=infert)
>>> > > > Error in Surv(rep(1, 248L), case) : Invalid status value
>>> > > > > > > > David, I think you were on the right track. I get this:
>>> > > > > -----------
>>> > > >
>>> clogit(I(case*runif(length(case)))~spontaneous+induced+strata(ifelse(stratum>40,NA,stratum)),data=infert)
>>> > > > > Error in fitter(X, Y, strats, offset, init, control,
>>> weights = > > weights, :
>>> > > NA/NaN/Inf in foreign function call (arg 6)
>>> > > In addition: Warning messages:
>>> > > 1: In Surv(rep(1, 248L), I(case * runif(length(case)))) :
>>> > > Invalid status value, converted to NA
>>> > > 2: In fitter(X, Y, strats, offset, init, control, weights =
>>> weights, > > :
>>> > > Ran out of iterations and did not converge
>>> > > > > > ------------
>>> > > > > which looks pretty much the same as Hien's error msg
>>> > > > > So Hien needs to create a logical status value.
>>> > > > > Chuck
>>> > > > > p.s.
>>> > > > > > sessionInfo()
>>> > > R version 2.10.0 (2009-10-26)
>>> > > i386-pc-mingw32
>>> > > > > locale:
>>> > > [1] LC_COLLATE=English_United States.1252
>>> > > [2] LC_CTYPE=English_United States.1252
>>> > > [3] LC_MONETARY=English_United States.1252
>>> > > [4] LC_NUMERIC=C
>>> > > [5] LC_TIME=English_United States.1252
>>> > > > > attached base packages:
>>> > > [1] splines stats graphics grDevices utils datasets
>>> > > methods
>>> > > [8] base
>>> > > > > other attached packages:
>>> > > [1] survival_2.35-7
>>> > > > > loaded via a namespace (and not attached):
>>> > > [1] tools_2.10.0
>>> > > > > > > > > > > So I certainly would not have proceeded to
>>> submit a full analysis to > > > clogit if I could not get a test
>>> case to run under the situation you > > > propose.
>>> > > > > > > -- > > > David
>>> > > > > > > > > > > > I have checked the collinearity among
>>> predictors and they are all > > > > < 0.5 (which I think is OK). Do
>>> you know what else could make this > > > > errors?
>>> > > > > > > > > Thanks a lot
>>> > > > > > > > > Hien Nguyen
>>> > > > > > > > > David Winsemius wrote:
>>> > > > > > > On Dec 4, 2009, at 9:22 AM, Hien Nguyen wrote:
>>> > > > > > > > Dear R-helpers,
>>> > > > > > > > > I am very new to R and trying to run the
>>> conditional logit > > > > model using
>>> > > > > > > "clogit " command.
>>> > > > > > > I have more than 4000 observations in my dataset and
>>> try to > > > > predict the
>>> > > > > > > dependent variable from 14 independent variables. My
>>> command > > > > is as > > follows
>>> > > > > > > > > clmtest1 <-
>>> > > > > > > > > > >
>>> clogit(Pin~Income+Bus+Pop+Urbpro+Health+Student+Grad+NE+NW+NCC+SCC+CH+SE+MRD+strata(IDD),data=clmdata)
>>> > > > > > > > > > > However, it produces the following errors:
>>> > > > > > > > > Error in fitter(X, Y, strats, offset, init,
>>> control, > > > > weights = weights, > > :
>>> > > > > > > NA/NaN/Inf in foreign function call (arg 6)
>>> > > > > > > In addition: Warning messages:
>>> > > > > > > 1: In Surv(rep(1, 4096L), Pinmig) : Invalid status
>>> value, > > > > converted to > > NA
>>> > > > > > > 2: In fitter(X, Y, strats, offset, init, control,
>>> weights = > > > > weights, :
>>> > > > > > > Ran out of iterations and did not converge
>>> > > > > > > > > I search the error message from R forums but it
>>> does not > > > > say anything
>>> > > > > > > for Conditional Logit Model.
>>> > > > > > > With that many predictors in a small dataset, you may
>>> have > > > > created matrix > singularities. Perhaps you created a
>>> stratum > > > > where all of the subjects > experience the event
>>> and others where > > > > none did so. The coefficients might > be
>>> driven to infinities. Try > > > > simplifying the model.
>>> > > > > > > > > > > Please check for me what it says and what
>>> should I do > > > > to solve it.
>>> > > > > > > > > > > > > David Winsemius, MD
>>> > > > Heritage Laboratories
>>> > > > West Hartford, CT
>>> > > > > > > ______________________________________________
>>> > > > R-help at r-project.org mailing list
>>> > > > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > > > PLEASE do read the posting guide > > >
>>> http://www.R-project.org/posting-guide.html
>>> > > > and provide commented, minimal, self-contained, reproducible
>>> code.
>>> > > > > > > > Charles C. Berry (858)
>>> 534-2098
>>> > > Dept of
>>> Family/Preventive > > Medicine
>>> > > E mailto:cberry at tajo.ucsd.edu UC San Diego
>>> > > http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego
>>> > > 92093-0901
>>> > > > > > > ______________________________________________
>>> > R-help at r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide >
>>> http://www.R-project.org/posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>> >
>>> Thomas Lumley Assoc. Professor, Biostatistics
>>> tlumley at u.washington.edu University of Washington, Seattle
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> Charles C. Berry (858) 534-2098
> Dept of Family/Preventive
> Medicine
> E mailto:cberry at tajo.ucsd.edu UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego
> 92093-0901
>
>
More information about the R-help
mailing list