[R] alternative to logistic regression

Fri Nov 16 16:41:18 CET 2007

>From: Prof Brian Ripley <ripley at stats.ox.ac.uk>
>Date: 2007/11/16 Fri AM 09:28:27 CST
>To: Terry Therneau <therneau at mayo.edu>
>Cc: markleeds at verizon.net, r-help at r-project.org
>Subject: Re: [R] alternative to logistic regression

Thanks to both of you, Terry and Brian for your comments. I'm not sure what I am going to do yet because I don't have enough data yet to explore/ 
confirm my linear hypothesis but your comments
will help if I go that route.

I just had one other question since I have you both thinking about GLM's at the moment : Suppose one
is doing logistic or more generally multinomial regression with one predictor. The predictor is quantitative
in the range of [-1,1] but, if I scale it, then
the range becomes whatever it becomes.

But, there's also the possibility of making the predictor a factor say by deciling it and then say letting the deciles be the factors.

My question is whether would one expect roughly the same probability forecasts from two models, one using the numerical predictor and one using the factors ?  I imagine that it shouldn't matter so much but I have ZERO experience in logistic regression and I'm not confident with my current intuition.  Thanks so much for talking about my problem and I really appreciate your insights.

                                 Mark

>On Fri, 16 Nov 2007, Terry Therneau wrote:
>
>> You can fit a linear probability model with glm and a bit of arm twisting.
>> First, make your own copy of the binomial function:
>>   > dump('binomial', file='mybinom.R')
>>
>> Edit it to change the function name to "mybinom" (or anything else you
>> like), and to add 'identity' to the list of okLinks.
>
>Hmm ... I think you are generalizing from another R-like system.
>
>binomial("identity") works out of the box, and R's glm() will 
>backtrack if it encounters a fitted value < 0 or > 1.  Now, the 
>back-tracking can get stuck but it often does a reasonable job.
>
>Examples:
>
>set.seed(1)
>x <- seq(0, 1, length=10)
>y <- rbinom(10, 10, 0.1 + 0.8*x)
>glm(cbind(y, 10-y)  ~ x, binomial("identity"), start=c(0.5,0))
>
>works.  But variants, e.g.
>
>y <- rbinom(10, 10, 0.1 + 0.9*x)
>
>backtrack and give warnings.
>
>What does not work is binomial(identity), unlike binomial(logit).
>
>>  Source the file back in, and use mybiom('identity') to fit the model.
>>
>>  Caveat Emptor: This will work just fine if all of your data is far enough away
>> from 0 or 1.  But if the predicted value for any data point, at any time during
>> the iteration, is <=0 or >=1 then the calculation of the log-likelihood will
>> involve an NA and the glm routine will fail.  NAs produced deep inside a
>> computation tend to produce unhelpful and/or misleading error messages
>> (sometimes the NA can propogate for some ways through the code before creating a
>> failure).  You can also get the counterintuitive result that models with few or
>> poor covariates work (all the predictions are near to the mean), but more useful
>> covariates cause the model to fail.
>>
>>  Linear links for both the binomial and Poisson are a challenging computational
>> problem.  But they are becoming important in medical work due to recent
>> appreciation that the absolute risk attached to a variable is often more
>> relevant than the relative risk (odds ratio or risk ratio).
>>
>>  	Terry Therneau
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>-- 
>Brian D. Ripley,                  ripley at stats.ox.ac.uk
>Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>University of Oxford,             Tel:  +44 1865 272861 (self)
>1 South Parks Road,                     +44 1865 272866 (PA)
>Oxford OX1 3TG, UK                Fax:  +44 1865 272595