[R] Graphical presentation of logistic regression

Thu Sep 15 14:43:20 CEST 2005

If a graphical presentation provides improved insight then that is
sufficient justification.  The existence of "better" more precise methods,
does not change that.

I, too, sometimes use jitter() to avoid overplotting of observations, but I
think the dot-plots in de la Cruz's code are even better.  It is the
histogram that is misleading (due to paucity of data), not the effort to
elucidate the joint behavior of zeros and ones. 
http://www.esapubs.org/bulletin/backissues/086-1/bulletinjan2005.htm#et

Please try a variation that his code provides:

plot.logi.hist(independ = altitude, depend = tree, logi.mod = 1, type =
"dit", boxp = TRUE, rug = TRUE, las.h = 1)

which does not use the histograms but instead uses "dit plots" to provide a
helpful, visceral feel for the behavior of the observations.

Charles Annis, P.E.

Charles.Annis at StatisticalEngineering.com
phone: 561-352-9699
eFax:  614-455-3265
http://www.StatisticalEngineering.com

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Jari Oksanen
Sent: Thursday, September 15, 2005 3:17 AM
To: Frank E Harrell Jr
Cc: r-help at stat.math.ethz.ch; Beale, Colin
Subject: Re: [R] Graphical presentation of logistic regression

On Wed, 2005-09-14 at 06:29 -0500, Frank E Harrell Jr wrote:
> Beale, Colin wrote:
> > Hi,
> > 
> > I wonder if anyone has written any code to implement the suggestions of
> > Smart et al (2004) in the Bulletin of the Ecological Society of America
> > for a new way of graphically presenting the results of logistic
> > regression (see
> > www.esapubs.org/bulletin/backissues/085-3/bulletinjuly2004_2column.htm#t
> > ools1 for the full text)? I couldn't find anything relating to this sort
> > of graphical representation of logistic models in the archives, but
> > maybe someone has solved it already? In short, Smart et al suggest that
> > a logistic regression be presented as a combination of the two
> > histograms for successes and failures (with one presented upside down at
> > the top of the figure, the other the right way up at the bottom)
> > overlaid by the probability function (ie logistic curve). It's somewhat
> > hard to describe, but is nicely illustrated in the full text version
> > above. I think it is a sensible way of presenting these results and am
> > keen to do so - at the moment I can only do this by generating the two
> > histograms and the logistic curve separately (using hist() and lines()),
> > then copying and pasting the graphs out of R and inverting one in a
> > graphics package, before overlying the others. I'm sure this could be
> > done within R and would be a handy plotting function to develop. Has
> > anyone done so, or can anyone give me any pointers to doing this? I
> > really nead to know how to invert a histogram and how to overlay this
> > with another histogram "the right way up".
> > 
> > Any thoughts would be welcome.
> > 
> > Thanks in advance,
> > Colin
> 
>  From what you describe, that is a poor way to represent the model 
> except for judging discrimination ability (if the model is calibrated 
> well).  Effect plots, odds ratio charts, and nomograms are better.  See 
> the Design package for details.
> 

You're correct when you say that this is a poor way to represent the
model. However, you should have some understanding to us ecologists who
are simple creatures working with tangible subjects such as animals and
plants (microbiologists work with less tangible things). Therefore we
want to have a concrete and simple representation. After all, the
example was about occurrence of an animal against a concrete
environmental variable, and a concrete representation was suggested.
Nomograms and things are abstractions that you understand first after
long education and training (I tried the Design package and I didn't
understand the nomogram plot). 

I tried with one concrete example with my own data, and the inverted
histogram method was patently misleading (with Baz Rowlingson's neat and
compact code, sorry for the repetition). The method would be useful with
dense and regular data only, but now the clearest visual cue was the
uneven sampling intensity. With my limited knowledge on R facilities, I
can now remember only two ways two preserve the concreteness of display
in the base R: jitter() to avoid overplotting of observations, and
sunflowerplot() to show the amount of overplotting.

I think Ecological Society of America would be happy to receive papers
to suggest better ways to represent binary response data, if some of the
knowledgeable persons in this groups would decided to educate them (I'm
not an ESA member, so I wouldn't be educated: therefore 'them' instead
of 'us'). The ESA bulletin will be influential in manuscript submitted
to the Society journals in the future, and the time for action is now.

cheers, jari oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
Ph. +358 8 5531526, cell +358 40 5136529, fax +358 8 5531061
email jari.oksanen at oulu.fi, homepage http://cc.oulu.fi/~jarioksa/

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html