[R] logistic regression model with non-integer weights

Sun Apr 16 20:16:51 CEST 2006

On Sun, 2006-04-16 at 19:10 +0100, Ramón Casero Cañas wrote:

> Thanks for your suggestions, Michael. It took me some time to figure out
> how to do this in R (as trivial as it may be for others). Some comments
> about what I've done follow, in case anyone is interested.
> 
> The problem is a) abnormality is rare (Prevalence=14%) and b) there is
> not much difference in the independent variable between abnormal and
> normal. So the logistic regression model predicts that P(abnormal) <=
> 0.4. I got confused with this, as I expected a cut-off point of P=0.5 to
> decide between normal/abnormal. But you are right, in that another
> cut-off point can be chosen.
> 
> For a cut-off of e.g. P(abnormal)=0.15, Sensitivity=65% and
> Specificity=52%. They are pretty bad, although for clinical purposes I
> would say that Positive/Negative Predictive Values are more interesting.
> But then PPV=19% and NPV=90%, which isn't great. As an overall test of
> how good the model is for classification I have computed the area under
> the ROC, from your suggestion of using Sensitivity and Specificity.
> 
> I couldn't find how to do this directly with R, so I implemented it
> myself (it's not difficult but I'm new here). I tried with package ROCR,
> but apparently it doesn't cover binary outcomes.
> 
> The area under the ROC is 0.64, so I would say that even though the
> model seems to fit the data, it just doesn't allow acceptable
> discrimination, not matter what the cut-off point.
> 
> 
> I have also studied the effect of low prevalence. For this, I used
> option ran.gen in the boot function (package boot) to define a function
> that resamples the data so that it balances abnormal and normal cases.
> 
> A logistic regression model is fitted to each replicate, to a parametric
> bootstrap, and thus compute the bias of the estimates of the model
> coefficients, beta0 and beta1. This shows very small bias for beta1, but
> a rather large bias for beta0.
> 
> So I would say that prevalence has an effect on beta0, but not beta1.
> This is good, because a common measure like the odds ratio depends only
> on beta1.
> 
> Cheers,
> 
> -- 
> Ramón Casero Cañas
> 
> http://www.robots.ox.ac.uk/~rcasero/wiki
> http://www.robots.ox.ac.uk/~rcasero/blog
> 

The Epi package has function ROC that draws the ROC curve and computes
the AUC among other things.

Rick B.