[R] Error when running Conditional Logit Model
David Winsemius
dwinsemius at comcast.net
Fri Dec 18 20:56:59 CET 2009
On Dec 18, 2009, at 2:46 PM, Hien Nguyen wrote:
> Dear Drs Winsemius and Berry,
>
> Thanks a lot for your comment and suggestions on running my model. I
> am not just new to R but new to CLM as well. :( With your
> suggestions, I figure out that I have huge misunderstandings on the
> model and data arrangement.
>
> After my finals, I have read again related materials on CLM and
> rearranged in an appropriate way before running the model in R. This
> time, I have a data of more than 250,000 observations (created from
> more than 4000 response) and a model of 15 predictors.
>
> My question is that how long should it takes for the clogit command
> to run because it has been running for more 10 hours on a quad-core
> computer and still doesn't show any sign of done or almost done. Is
> it OK or my command just does not work.
Quad-core would not help speed unless you used a multi-core
application, which base R is not. Memory and OS is also important to
specify. Don't have experience with that function but I suspect your
machine is hung. My ordinary logistic regressions and Cox models with
4.5 million cases and 40,000 events and 12 df in the X side of the
formula take minutes. I would terminate the session and then try the
same code with a 1/100 sample; see what the system times are; and
scale up to a full situation.
--
David
>
> Thanks a lot for your response
>
> Hien
>
>
> Charles C. Berry wrote:
>> On Fri, 4 Dec 2009, David Winsemius wrote:
>>
>>>
>>> On Dec 4, 2009, at 5:49 PM, Hien Nguyen wrote:
>>>
>>>> Dear Dr. Winsemius,
>>>>
>>>> Thank you very much for your reply.
>>>>
>>>> I have tried many possible combinations (even with the model of
>>>> only 2 predictors) but it produces the same message. With more
>>>> than 4000 observations, I think 14 predictors might not be too
>>>> many.
>>>
>>> It is what happens in the factor combinations that concern me. I
>>> am guessing that some of those predictors are factors. You really
>>> should not ask r-help questions without providing better
>>> descriptions of both the outcomes and the predictor variables.
>>>
>>>>
>>>> Although my dependent variable (Pin) is not discrete (it ranges
>>>> from 0 to 1), I do not think it will create problems to the
>>>> estimation but I'm not sure
>>>
>>> I would think it _would_ cause problems. As I understand it,
>>> conditional methods create contingency tables. Why are you using
>>> an outcome type that is not consistent with the fundamental
>>> regression assumptions of the clogit function?
>>>
>>> I do not get that particular error when I munge the infert dataset
>>> to have case be a random uniform value, but I do get an error.
>>>> infert$case <- runif(nrow(infert))
>>>> clogit(case~spontaneous+induced+strata(stratum),data=infert)
>>> Error in Surv(rep(1, 248L), case) : Invalid status value
>>>
>>
>> David, I think you were on the right track. I get this:
>>
>> -----------
>>> clogit(I(case*runif(length(case)))~spontaneous+induced
>>> +strata(ifelse(stratum>40,NA,stratum)),data=infert)
>> Error in fitter(X, Y, strats, offset, init, control, weights =
>> weights, :
>> NA/NaN/Inf in foreign function call (arg 6)
>> In addition: Warning messages:
>> 1: In Surv(rep(1, 248L), I(case * runif(length(case)))) :
>> Invalid status value, converted to NA
>> 2: In fitter(X, Y, strats, offset, init, control, weights =
>> weights, :
>> Ran out of iterations and did not converge
>>>
>> ------------
>>
>> which looks pretty much the same as Hien's error msg
>>
>> So Hien needs to create a logical status value.
>>
>> Chuck
>>
>> p.s.
>>
>>> sessionInfo()
>> R version 2.10.0 (2009-10-26)
>> i386-pc-mingw32
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252
>> [2] LC_CTYPE=English_United States.1252
>> [3] LC_MONETARY=English_United States.1252
>> [4] LC_NUMERIC=C
>> [5] LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] splines stats graphics grDevices utils datasets
>> methods
>> [8] base
>>
>> other attached packages:
>> [1] survival_2.35-7
>>
>> loaded via a namespace (and not attached):
>> [1] tools_2.10.0
>>>
>>
>>
>>> So I certainly would not have proceeded to submit a full analysis
>>> to clogit if I could not get a test case to run under the
>>> situation you propose.
>>>
>>> --
>>> David
>>>
>>>>
>>>> I have checked the collinearity among predictors and they are all
>>>> < 0.5 (which I think is OK). Do you know what else could make
>>>> this errors?
>>>>
>>>> Thanks a lot
>>>>
>>>> Hien Nguyen
>>>>
>>>> David Winsemius wrote:
>>>> > > On Dec 4, 2009, at 9:22 AM, Hien Nguyen wrote:
>>>> > > > Dear R-helpers,
>>>> > > > > I am very new to R and trying to run the conditional
>>>> logit model using
>>>> > > "clogit " command.
>>>> > > I have more than 4000 observations in my dataset and try to
>>>> predict the
>>>> > > dependent variable from 14 independent variables. My command
>>>> is as > > follows
>>>> > > > > clmtest1 <-
>>>> > > clogit(Pin~Income+Bus+Pop+Urbpro+Health+Student+Grad+NE+NW+NCC
>>>> +SCC+CH+SE+MRD+strata(IDD),data=clmdata)
>>>> > > > > > > However, it produces the following errors:
>>>> > > > > Error in fitter(X, Y, strats, offset, init, control,
>>>> weights = weights, > > :
>>>> > > NA/NaN/Inf in foreign function call (arg 6)
>>>> > > In addition: Warning messages:
>>>> > > 1: In Surv(rep(1, 4096L), Pinmig) : Invalid status value,
>>>> converted to > > NA
>>>> > > 2: In fitter(X, Y, strats, offset, init, control, weights =
>>>> weights, :
>>>> > > Ran out of iterations and did not converge
>>>> > > > > I search the error message from R forums but it does not
>>>> say anything
>>>> > > for Conditional Logit Model.
>>>> > > With that many predictors in a small dataset, you may have
>>>> created matrix > singularities. Perhaps you created a stratum
>>>> where all of the subjects > experience the event and others where
>>>> none did so. The coefficients might > be driven to infinities.
>>>> Try simplifying the model.
>>>> > > > > > > Please check for me what it says and what should I do
>>>> to solve it.
>>>> > >
>>>
>>> David Winsemius, MD
>>> Heritage Laboratories
>>> West Hartford, CT
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> Charles C. Berry (858) 534-2098
>> Dept of Family/
>> Preventive Medicine
>> E mailto:cberry at tajo.ucsd.edu UC San Diego
>> http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego
>> 92093-0901
>>
>>
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list