[R] Survival analysis MLE gives NA or enormous standard errors

Charles C. Berry cberry at tajo.ucsd.edu
Tue Jul 27 18:06:15 CEST 2010


On Tue, 27 Jul 2010, Christopher David Desjardins wrote:

> Hi Charles,
>
> On Fri, 2010-07-23 at 14:40 -0700, Charles C. Berry wrote:
>> On Fri, 23 Jul 2010, Christopher David Desjardins wrote:
>>
>>> Sorry. I should have included some data. I've attached a subset of my
>>> data (50/192) cases in a Rdata file and have pasted it below.
>>>
>>> Running anova I get the following:
>>>
>>>> anova(sr.reg.s4.nore)
>>>                   Df Deviance Resid. Df    -2*LL P(>|Chi|)
>>> NULL               NA       NA        45 33.89752        NA
>>> as.factor(lifedxm)  2 2.438211        43 31.45931 0.2954943
>>>
>>> That would just be an omnibus test right and should that first NULL NA
>>> line be worrisome? What if I want to test specifically that CONTROL and
>>> BIPOLAR were different and that MAJOR DEPRESSION and BIPOLAR were
>>> different?
>>
>
> You wrote:
>
>> Construct a likelikehood ratio test for each hypothesis by fitting three
>> models - two containing each term  and one containing both - and comparing
>> each simpler model to the fuller model.
>>
>
> Would that be recoding lifedxm (presently a dummy variable where 0 -
> BIPOLAR, 1 - CONTROL, and 2 - MAJOR DEPRESSED) as three seperate
> variables: CONTROL (0 - No, 1 - Yes), BIPOLAR (0 - N0, 1 - Yes), and
> MAJOR DEPRESSED (0 - No, 1 - Yes) and then comparing the following
> models with anova()?
>
> CONTROL + BIPOLAR to MAJOR
> CONTROL + MAJOR to BIPOLAR
>
> I am sorry I am just a little confused. Basically I want to know if
> BIPOLAR is a higher risk than MAJOR and CONTROL and if MAJOR is a higher
> risk than CONTROL.
>

It would help communication if you would use a standard notation like R.

The meaning of pseudocodes tends to be a bit fuzzy. And from far below, it 
seems you have a "factor" called lifedxm not a 'dummy variable' in the 
sense that that term is often used.

Here are three models defined using the formula language:


frm1 <- Surv(age_sym4, sym4) ~ I(lifedxm=="MAJOR")

frm2 <- Surv(age_sym4, sym4) ~ I(lifedxm=="BIPOLAR")

frm3 <- Surv(age_sym4, sym4) ~ I(lifedxm=="MAJOR") +
 	I(lifedxm=="BIPOLAR")

The model of frm1 is nested under that of frm3.

The model of frm2 is nested under that of frm3.

anova( frm1, frm3 ) will report on the effect of adding 
I(lifedxm=="BIPOLAR") to frm1. i.e. it will give the increase in 
likelihood associated with that term.

anova( frm2, frm3 ) will report on the effect of 
adding I(lifedxm=="MAJOR") to frm2.

Chuck



> Thank you very much for all your help,
> Chris
>
>>>
>>> I'll look at Hauck-Donner effect.
>>>
>>> Thanks,
>>> Chris
>>>
>>>> bip.surv.s
>>>   age_sym4 sym4 lifedxm
>>> 1  16.12868    0   MAJOR
>>> 2  19.32649    0   MAJOR
>>> 3  16.55031    0 CONTROL
>>> 4  19.36756    0 CONTROL
>>> 5  16.09035    0   MAJOR
>>> 6  21.50582    0   MAJOR
>>> 7  16.36140    0   MAJOR
>>> 8  20.57221    0   MAJOR
>>> 9  16.45722    0 CONTROL
>>> 10 19.94524    0 CONTROL
>>> 11 15.79192    0   MAJOR
>>> 12 20.76660    0   MAJOR
>>> 13 16.15058    0 BIPOLAR
>>> 14 19.25804    0 BIPOLAR
>>> 15 17.36345    0   MAJOR
>>> 16 21.18001    0   MAJOR
>>> 17       NA    0 BIPOLAR
>>> 18       NA    0 BIPOLAR
>>> 19 16.31759    1   MAJOR
>>> 20 18.29706    0   MAJOR
>>> 21 16.40794    0   MAJOR
>>> 22 19.13758    0   MAJOR
>>> 23 16.19439    0 CONTROL
>>> 24 21.36893    0 CONTROL
>>> 25 15.89049    0 CONTROL
>>> 26 18.99795    0 CONTROL
>>> 27       NA    0 BIPOLAR
>>> 28 18.90486    0 BIPOLAR
>>> 29 16.36413    0   MAJOR
>>> 30 20.42710    0   MAJOR
>>> 31 16.65982    0   MAJOR
>>> 32 19.45791    0   MAJOR
>>> 33 16.64339    0 CONTROL
>>> 34 19.40041    0 CONTROL
>>> 35 15.37303    1 BIPOLAR
>>> 36 19.83847    0 BIPOLAR
>>> 37 15.42231    1   MAJOR
>>> 38 19.37029    0   MAJOR
>>> 39 15.06913    0   MAJOR
>>> 40 17.81520    0   MAJOR
>>> 41 15.50445    0 BIPOLAR
>>> 42 17.92197    0 BIPOLAR
>>> 43 15.34565    0 CONTROL
>>> 44 18.07529    0 CONTROL
>>> 45 15.59480    0 CONTROL
>>> 46 19.67420    0 CONTROL
>>> 47 14.78987    0   MAJOR
>>> 48 20.05476    0   MAJOR
>>> 49 14.78713    0   MAJOR
>>> 50 19.86858    0   MAJOR
>>>
>>>
>>> On Fri, 2010-07-23 at 11:52 -0700, Charles C. Berry wrote:
>>>> On Fri, 23 Jul 2010, Christopher David Desjardins wrote:
>>>>
>>>>> Hi,
>>>>> I am trying to fit the following model:
>>>>>
>>>>> sr.reg.s4.nore <- survreg(Surv(age_sym4,sym4), as.factor(lifedxm),
>>>>> data=bip.surv)
>>>>
>>>> Next time include a reproducible example. i.e. something we can run.
>>>>
>>>> Now, Google "Hauck Donner Effect" to understand why
>>>>
>>>>  	anova(sr.reg.s4.nore)
>>>>
>>>> is preferred.
>>>>
>>>> Chuck
>>>>
>>>>
>>>>>
>>>>> Where age_sym4 is the age that a subject develops clinical thought
>>>>> problems; sym4 is whether they develop clinical thoughts problems (0 or
>>>>> 1); and lifedxm is mother's diagnosis: BIPOLAR, MAJOR DEPRESSION, or
>>>>> CONTROL.
>>>>>
>>>>> I am interested in whether or not survival differs by this covariate.
>>>>>
>>>>> When I run my model, I am getting the following output:
>>>>>
>>>>>> summary(sr.reg.s4.nore)
>>>>>
>>>>> Call:
>>>>> survreg(formula = Surv(age_sym4, sym4) ~ as.factor(lifedxm),
>>>>>    data = bip.surv)
>>>>>                           Value Std. Error     z       p
>>>>> (Intercept)                4.037      0.455  8.86643
>>>>> 0.000000000000000000755
>>>>> as.factor(lifedxm)CONTROL 14.844   4707.383  0.00315
>>>>> 0.997484052845082791450
>>>>> as.factor(lifedxm)MAJOR    0.706      0.447  1.58037
>>>>> 0.114022774867277756905
>>>>> Log(scale)                -0.290      0.267 -1.08493
>>>>> 0.277952437474223823521
>>>>>
>>>>> Scale= 0.748
>>>>>
>>>>> Weibull distribution
>>>>> Loglik(model)= -76.3   Loglik(intercept only)= -82.6
>>>>> 	Chisq= 12.73 on 2 degrees of freedom, p= 0.0017
>>>>> Number of Newton-Raphson Iterations: 21
>>>>> n=186 (6 observations deleted due to missingness)
>>>>>
>>>>>
>>>>> I am concerned about the p-value of 0.997 and the SE of 4707. I am
>>>>> curious if it has to do with the fact that the CONTROL group doesn't
>>>>> have a mixed response, meaning that all my subjects do not develop
>>>>> clinical levels of thought problems and subsequently 'survive'.
>>>>>
>>>>>> table(bip.surv$sym4,bip.surv$lifedxm)
>>>>>
>>>>>    BIPOLAR CONTROL MAJOR
>>>>>  0      41      60    78
>>>>>  1       7       0     6
>>>>>
>>>>> Is there some sort of way that I can overcome this? Is my model
>>>>> misspecified? Is this better suited to be run as a Bayesian model using
>>>>> priors to overcome the lack of a mixed response?
>>>>>
>>>>> Also, please cc me on an email as I am a digest subscriber.
>>>>> Thanks,
>>>>> Chris
>>>>>
>>>>>
>>>>> --
>>>>> Christopher David Desjardins
>>>>> PhD student, Quantitative Methods in Education
>>>>> MS student, Statistics
>>>>> University of Minnesota
>>>>> 192 Education Sciences Building
>>>>> http://cddesjardins.wordpress.com
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>> Charles C. Berry                            (858) 534-2098
>>>>                                              Dept of Family/Preventive Medicine
>>>> E mailto:cberry at tajo.ucsd.edu	            UC San Diego
>>>> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
>>>>
>>>>
>>>
>>> --
>>> Christopher David Desjardins
>>> PhD student, Quantitative Methods in Education
>>> MS student, Statistics
>>> University of Minnesota
>>> 192 Education Sciences Building
>>> http://cddesjardins.wordpress.com
>>>
>>
>> Charles C. Berry                            (858) 534-2098
>>                                              Dept of Family/Preventive Medicine
>> E mailto:cberry at tajo.ucsd.edu	            UC San Diego
>> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
>>
>>
>
> -- 
> Christopher David Desjardins
> PhD student, Quantitative Methods in Education
> MS student, Statistics
> University of Minnesota
> 192 Education Sciences Building
> http://cddesjardins.wordpress.com
>
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901



More information about the R-help mailing list