[R] Define a glm object with user-defined coefficients (logistic regression, family="binomial")
David Winsemius
dwinsemius at comcast.net
Sun Nov 14 16:36:31 CET 2010
On Nov 14, 2010, at 8:35 AM, Jürgen Biedermann wrote:
> Hi David,
>
> Thank you very much for your answer. It helps me a lot. The offset
> argument was the key (I didn't understand the description in the R-
> help file)
> Rereading my email I found a mistake in the definition of my
> formula. Instead of p(y) = exp(a + c1*x1 + c2*x2), it has to be:
> p(y) = exp(a + c1*x1 + c2*x2)/(1+exp(a + c1*x1 + c2*x2)), but
> actually that doesn't matter much in our case.
Oh, but I think it does matter (at least in the situation where
p(event) are above 1/20 or so. At lower probabilities it wouldn't
matter very much. You are posting above the default logistic equation
used when the distribution is specified as "binomial" which I meant to
include but forgot. Had I remembered to include the ...,
family="binomial" it would have implied what you write above. So:
firstfit <- glm( c(event,no_event) ~ 1 + x1 +x2, data=dfrm,
family="binomial")
# Is the model you cite above and I meant to write:
forcedfit <- glm( c(event,no_event) ~ 1 + offset(logoff), data=dfrm,
family="binomial")
(I think the only reason this went unremarked by the usually eagle-
eyed rhelp viewership is the weekend has a decreased viewing activity.)
>
>> The anova results would have not much interpretability in this
>> setting. You would be testing for the Intercept being zero under
>> very artificial conditions. You have eliminated much statistical
>> meaning by forcing the form of the results.
>
> Imagine the following. I develop a model on one dataset and want to
> validate it on another. So I could use the coefficents trained on
> the first dataset to define a glm model (named: ModelV) on the
> second dataset. Then i could test this model against a NULL model
> (named: ModelV0) of the second dataset with anova(ModelV, ModelV0,
> test="Chisq").
Assuming this is a split dataset from run 1 in a split datast, why
not develop a score from the the first set and run it against a model
that includes the score and the original variables.
Model 1: mod1 <- glm( c(event,no_event) ~ 1 + x1 +x2, data=dfrm1)
>> dfrm2$scr1 <- with(dfrm2, Intercept + c1*x1 + c2*x2))
Model2: mod2 <- glm( c(event,no_event) ~ 0 + x1 +x2 +scr1, data=dfrm2)
(In this manner impact from the deviations induced by varying X values
would be assessed as evidence of potential model mis-specification.
This would be most successful with larger datasets. It would probably
be a better use of a smaller dataset, however, to get a bootstrapped
validation of a model. Or to do a one-tenth holdout and do the
validations run ten times. Harrells' "Regression Modeling Strategies"
gives a good discussion of validation strategies and his rms amd Hmisc
packages provide the functions to perform the validation and display
results neatly.)
>
> Best Wishes
> Jürgen
>
> --
> -----------------------------------
> Jürgen Biedermann
> Blücherstraße 56
> 10961 Berlin-Kreuzberg
> Mobil: +49 176 247 54 354
> Home: +49 30 250 11 713
> e-mail: juergen.biedermann at gmail.com
>
>
> --------- Korrespondenz ----------
>
> Betreff: Re: [R] Define a glm object with user-defined coefficients
> (logistic regression, family="binomial")
> Von: David Winsemius <dwinsemius at comcast.net>
> An: Jürgen Biedermann <juergen.biedermann at googlemail.com>
> Datum: 13.11.2010 17:15
>>
>> On Nov 13, 2010, at 7:43 AM, Jürgen Biedermann wrote:
>>
>>> Hi there,
>>>
>>> I just don't find the solution on the following problem. :(
>>>
>>> Suppose I have a dataframe with two predictor variables (x1,x2)
>>> and one depend binary variable (y). How is it possible to define a
>>> glm object (family="binomial") with a user defined logistic
>>> function like p(y) = exp(a + c1*x1 + c2*x2) where c1,c2 are the
>>> coefficents which I define. So I would like to do no fitting of
>>> the coefficients. Still, I would like to define a GLM object
>>> because I could then easily use other functions which need a glm
>>> object as argument (e.g. I could use the anova,
>>
>> The anova results would have not much interpretability in this
>> setting. You would be testing for the Intercept being zero under
>> very artificial conditions. You have eliminated much statistical
>> meaning by forcing the form of the results.
>>
>>> summary functions).
>>
>> # Assume dataframe name is dfrm with variables event, no_event, x1,
>> x2, and further assume c1 and c2 are also defined:
>>
>> dfrm$logoff <- with(dfrm, log(c1*x1 + c2*x2))
>> forcedfit <- glm( c(event,no_event) ~ 1 + offset(logoff), data=dfrm)
>>
>> (Obviously untested.)
>>
>>>
>>> Thank you very much! Greetings
>>> Jürgen
>>>
>>> --
>>> -----------------------------------
>>> Jürgen Biedermann
>>
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list