[R] Practical work with logistic regression

Frank E Harrell Jr f.harrell at Vanderbilt.Edu
Fri Apr 23 04:41:10 CEST 2010

Claus O'Rourke wrote:
> Dear all,
> I have a couple of short noob questions for whoever can take them. I'm
> from a very non-stats background so sorry for offending anybody with
> stupid questions ! :-)
> I have been using logistic regression care of glm to analyse a binary
> dependent variable against a couple of independent variables. All has
> gone well so far. In my work I have to compare the accuracy of
> analysis to a C4.5 machine learning approach. With the machine
> learning, a straight-forward measure of the quality of the classifier
> is simply the percentage of correctly classified instances. I can
> calculate this for the resultant model by comparing predictions to
> original values 'manually'. My question: is this not automatically -
> or easily - calculated in the produced model or the summary of that
> model?

The percent classified correctly is an improper scoring rule that will 
lead to a selection of a bogus model.  You can easily find examples 
where adding a very important variable to a binary logistic model 
results in a decrease in the percent "correct".


> I want to use my model in real time to produce results for new inputs.
> Basically this model is to be used as a classifier for a robot in real
> time. Can anyone suggest the best way that a produced model can be
> used directly in external code once the model has been developed in R?
> If my external code is in Java, then using jri is one option. A more
> efficient method would be to take the intercept and coefficients and
> actually code up the function in the appropriate programming language.
> Has anyone ever tried doing this?
> Apologies again for the stupid questions, but the sooner I get some of
> these things straight, the better.
> Claus
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Frank E Harrell Jr   Professor and Chairman        School of Medicine
                      Department of Biostatistics   Vanderbilt University

More information about the R-help mailing list