[R] Error when running Conditional Logit Model
David Winsemius
dwinsemius at comcast.net
Sat Dec 19 05:02:54 CET 2009
On Dec 18, 2009, at 7:39 PM, Hien Nguyen wrote:
> Thanks a lot for answering my questions.
>
> I have tried to run the clogit for only 64 observations and 4
> independent variables and the results are solved instantly. However,
> when I run the same command (with only 4 dependent variables) for
> the full data, it keeps running for 50 minutes now. :(
>
> Thomas, what do you mean by "maximizing the unconditional likelihood
> is fine when the stratum sizes are large"? What I put in "strata
> (__)" is actually the possible choices (1-64). Each choices will be
> recored more than 4000 times (which means I have more than 4000
> values of 1, 4000 values of 2 and so on).
> Does it sound right?
I'm pretty sure he means glm( formula, family="binomial", ...) and
skip the strata specification.
--
David.
>
> Thanks a lot
>
> Hien
>
> tlumley at u.washington.edu wrote:
>> On Fri, 18 Dec 2009, Hien Nguyen wrote:
>>
>>> Dear Drs Winsemius and Berry,
>>>
>>> Thanks a lot for your comment and suggestions on running my model.
>>> I am not just new to R but new to CLM as well. :( With your
>>> suggestions, I figure out that I have huge misunderstandings on
>>> the model and data arrangement.
>>>
>>> After my finals, I have read again related materials on CLM and
>>> rearranged in an appropriate way before running the model in R.
>>> This time, I have a data of more than 250,000 observations
>>> (created from more than 4000 response) and a model of 15 predictors.
>>>
>>> My question is that how long should it takes for the clogit
>>> command to run because it has been running for more 10 hours on a
>>> quad-core computer and still doesn't show any sign of done or
>>> almost done. Is it OK or my command just does not work.
>>
>> If you have a lot of records with case=1 in a stratum, conditional
>> logistic regression will be extremely slow. And unnecessary:
>> maximizing the unconditional likelihood is fine when the stratum
>> sizes are large.
>>
>> Note that a quad-core computer won't help. Only one core will be
>> used in the computations.
>>
>> -thomas
>>
>>
>>
>>
>>> Thanks a lot for your response
>>>
>>> Hien
>>>
>>>
>>> Charles C. Berry wrote:
>>>> On Fri, 4 Dec 2009, David Winsemius wrote:
>>>>
>>>>>
>>>>> On Dec 4, 2009, at 5:49 PM, Hien Nguyen wrote:
>>>>>
>>>>>> Dear Dr. Winsemius,
>>>>>>
>>>>>> Thank you very much for your reply.
>>>>>>
>>>>>> I have tried many possible combinations (even with the model of
>>>>>> only 2 predictors) but it produces the same message. With more
>>>>>> than 4000 observations, I think 14 predictors might not be too
>>>>>> many.
>>>>>
>>>>> It is what happens in the factor combinations that concern me. I
>>>>> am guessing that some of those predictors are factors. You
>>>>> really should not ask r-help questions without providing better
>>>>> descriptions of both the outcomes and the predictor variables.
>>>>>
>>>>>>
>>>>>> Although my dependent variable (Pin) is not discrete (it
>>>>>> ranges from 0 to 1), I do not think it will create problems to
>>>>>> the estimation but I'm not sure
>>>>>
>>>>> I would think it _would_ cause problems. As I understand it,
>>>>> conditional methods create contingency tables. Why are you using
>>>>> an outcome type that is not consistent with the fundamental
>>>>> regression assumptions of the clogit function?
>>>>>
>>>>> I do not get that particular error when I munge the infert
>>>>> dataset to have case be a random uniform value, but I do get an
>>>>> error.
>>>>>> infert$case <- runif(nrow(infert))
>>>>>> clogit(case~spontaneous+induced+strata(stratum),data=infert)
>>>>> Error in Surv(rep(1, 248L), case) : Invalid status value
>>>>>
>>>>
>>>> David, I think you were on the right track. I get this:
>>>>
>>>> -----------
>>>>> clogit(I(case*runif(length(case)))~spontaneous+induced
>>>>> +strata(ifelse(stratum>40,NA,stratum)),data=infert)
>>>>
>>>> Error in fitter(X, Y, strats, offset, init, control, weights =
>>>> weights, :
>>>> NA/NaN/Inf in foreign function call (arg 6)
>>>> In addition: Warning messages:
>>>> 1: In Surv(rep(1, 248L), I(case * runif(length(case)))) :
>>>> Invalid status value, converted to NA
>>>> 2: In fitter(X, Y, strats, offset, init, control, weights =
>>>> weights, :
>>>> Ran out of iterations and did not converge
>>>>>
>>>> ------------
>>>>
>>>> which looks pretty much the same as Hien's error msg
>>>>
>>>> So Hien needs to create a logical status value.
>>>>
>>>> Chuck
>>>>
>>>> p.s.
>>>>
>>>>> sessionInfo()
>>>> R version 2.10.0 (2009-10-26)
>>>> i386-pc-mingw32
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=English_United States.1252
>>>> [2] LC_CTYPE=English_United States.1252
>>>> [3] LC_MONETARY=English_United States.1252
>>>> [4] LC_NUMERIC=C
>>>> [5] LC_TIME=English_United States.1252
>>>>
>>>> attached base packages:
>>>> [1] splines stats graphics grDevices utils datasets
>>>> methods
>>>> [8] base
>>>>
>>>> other attached packages:
>>>> [1] survival_2.35-7
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] tools_2.10.0
>>>>>
>>>>
>>>>
>>>>> So I certainly would not have proceeded to submit a full
>>>>> analysis to clogit if I could not get a test case to run under
>>>>> the situation you propose.
>>>>>
>>>>> --
>>>>> David
>>>>>
>>>>>>
>>>>>> I have checked the collinearity among predictors and they are
>>>>>> all < 0.5 (which I think is OK). Do you know what else could
>>>>>> make this errors?
>>>>>>
>>>>>> Thanks a lot
>>>>>>
>>>>>> Hien Nguyen
>>>>>>
>>>>>> David Winsemius wrote:
>>>>>> > > On Dec 4, 2009, at 9:22 AM, Hien Nguyen wrote:
>>>>>> > > > Dear R-helpers,
>>>>>> > > > > I am very new to R and trying to run the conditional
>>>>>> logit model using
>>>>>> > > "clogit " command.
>>>>>> > > I have more than 4000 observations in my dataset and try to
>>>>>> predict the
>>>>>> > > dependent variable from 14 independent variables. My
>>>>>> command is as > > follows
>>>>>> > > > > clmtest1 <-
>>>>>> > > clogit(Pin~Income+Bus+Pop+Urbpro+Health+Student+Grad+NE+NW
>>>>>> +NCC+SCC+CH+SE+MRD+strata(IDD),data=clmdata) > > > > > >
>>>>>> However, it produces the following errors:
>>>>>> > > > > Error in fitter(X, Y, strats, offset, init, control,
>>>>>> weights = weights, > > :
>>>>>> > > NA/NaN/Inf in foreign function call (arg 6)
>>>>>> > > In addition: Warning messages:
>>>>>> > > 1: In Surv(rep(1, 4096L), Pinmig) : Invalid status value,
>>>>>> converted to > > NA
>>>>>> > > 2: In fitter(X, Y, strats, offset, init, control, weights =
>>>>>> weights, :
>>>>>> > > Ran out of iterations and did not converge
>>>>>> > > > > I search the error message from R forums but it does
>>>>>> not say anything
>>>>>> > > for Conditional Logit Model.
>>>>>> > > With that many predictors in a small dataset, you may have
>>>>>> created matrix > singularities. Perhaps you created a stratum
>>>>>> where all of the subjects > experience the event and others
>>>>>> where none did so. The coefficients might > be driven to
>>>>>> infinities. Try simplifying the model.
>>>>>> > > > > > > Please check for me what it says and what should I
>>>>>> do to solve it.
>>>>>> > >
>>>>>
>>>>> David Winsemius, MD
>>>>> Heritage Laboratories
>>>>> West Hartford, CT
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>> Charles C. Berry (858) 534-2098
>>>> Dept of Family/
>>>> Preventive Medicine
>>>> E mailto:cberry at tajo.ucsd.edu UC San Diego
>>>> http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego
>>>> 92093-0901
>>>>
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> Thomas Lumley Assoc. Professor, Biostatistics
>> tlumley at u.washington.edu University of Washington, Seattle
>>
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list