[R] different results with plot.lm vs. plot.lm(which=c(2))

Greg Snow Greg.Snow at imail.org
Wed Nov 12 23:54:59 CET 2008


Just a clarification on one of my statements below.  I realize on rereading my statement on R-core paying attention that it could be interpreted as a possible criticism.  That is not how it was intended, rather that I have seen cases in the past where a discussion confirms that something is a bug and a member of R-core already has the fix started or in place before a formal bug report could be sent and so we should not send a bug report on this unless it was clear that one was needed (which it is not, I received an e-mail off line that Prof. Ripley has started looking into it).


--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Greg Snow
> Sent: Wednesday, November 12, 2008 1:07 PM
> To: Effie Greathouse; r-help at r-project.org
> Subject: Re: [R] different results with plot.lm vs. plot.lm(which=c(2))
>
> The same thing is affecting plot 2 as well.  Basically in the code
> there is line early on:
>
> r <- residuals(x)
>
> which gets the residuals and by default for glm models those are the
> deviance residuals.
>
> A bit latter is the code (after some optional modifications to r):
>
>     if (any(show[2:3])) {
>         ylab23 <- if (isGlm)
>             "Std. deviance resid."
>         else "Standardized residuals"
>         r.w <- if (is.null(w))
>             r
>         else sqrt(w) * r
>     }
>
> Which results in r.w being the residuals (or resids times the square
> root of w) if either plot 2 or 3 is requested.
>
> The next batch of code is:
>
>     if (show[5]) {
>         ylab5 <- if (isGlm)
>             "Std. Pearson resid."
>         else "Standardized residuals"
>         r.w <- residuals(x, "pearson")
>         if (!is.null(w))
>             r.w <- r.w[wind]
>     }
>
> Which changes r.w to the pearson residuls if plot 5 is requested, it
> also sets the label to use for plot 5, but does not change the label
> (ylab23) to be used for plots 2 and 3 and so they still are labeled as
> deviance residuals.
>
> The r.w variable is used latter in the code for plots 2, 3, and 5 (and
> maybe others).  Unless there is something else in between the above
> code and where r.w is used that fixes this (I did not see anything, but
> did not do more than skim the code) then there is a clear bug in that
> the residuals being used is inconsistent and mis-labeled in some cases.
>
> If anyone on R-core is paying attention to this thread, then the
> problem may already be in the process of being fixed (or it may already
> be fixed, check r-patched, the above is based on R 2.8.0 (2008-10-20)),
> I've had a couple of cases where the fix was in place before I could
> submit a bug report.  If we don't see any indication of it being fixed
> in the next couple of days, then one of us should submit a bug report.
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
>
>
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> > project.org] On Behalf Of Effie Greathouse
> > Sent: Wednesday, November 12, 2008 12:15 PM
> > To: r-help at r-project.org
> > Subject: Re: [R] different results with plot.lm vs.
> plot.lm(which=c(2))
> >
> > There's also a big difference in plot 2 (Normal Q-Q) in my real data,
> > but I
> > don't see a real difference in plot 2 for the example I sent, except
> > that
> > some of the outlier labels are different.  Would the residuals being
> > plotted
> > likely be the cause of the difference in plot 2 as well?  Dr. Snow,
> > would
> > you want to be able to see the plot 2 graphs for my real data?  I
> don't
> > know
> > how to save plot 2 when I'm clicking through the plot(model) graphs,
> so
> > I
> > can't just send it to you.  Thanks for checking this out Dr. Snow!
> >
> > On Wed, Nov 12, 2008 at 11:05 AM, Greg Snow <Greg.Snow at imail.org>
> > wrote:
> >
> > > From a quick look at the code it looks like when you ask for plot
> > number 5
> > > (included in default when 'which' is not specified), then the
> > deviance
> > > residuals are replaced by the pearson residuals to be used in later
> > > computations.  So the difference that you are seeing is that one of
> > the
> > > plots is based on deviance residuals and the other on pearson
> > residuls.
> > >
> > > It seems that there is a bug here in that, at a minimum, the label
> > should
> > > be changed to indicate which residuals were actually used, or the
> > code
> > > changed to continue to use the deviance residuals for plot 3 even
> > when plot
> > > 5 is requested.
> > >
> > > Does anyone else see something that I missed in how the residuals
> are
> > > replaced and used?
> > >
> > > --
> > > Gregory (Greg) L. Snow Ph.D.
> > > Statistical Data Center
> > > Intermountain Healthcare
> > > greg.snow at imail.org
> > > 801.408.8111
> > >
> > >
> > > > -----Original Message-----
> > > > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> > > > project.org] On Behalf Of Effie Greathouse
> > > > Sent: Wednesday, November 12, 2008 11:16 AM
> > > > To: r-help at r-project.org
> > > > Subject: Re: [R] different results with plot.lm vs.
> > plot.lm(which=c(2))
> > > >
> > > > Hi Dr. Ripley--Sorry for the repost everybody.  The original
> > message I
> > > > sent
> > > > never showed up in my inbox, so I thought it didn't get sent to
> the
> > > > list.
> > > >
> > > > I'm running R 2.8.0, installed from a pre-compiled version, on
> > Windows
> > > > XP.
> > > > When I type Sys.getlocale() at the R prompt, it returns:
> > > >  "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> > > > States.1252;LC_MONETARY=English_United
> > > > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
> > > >
> > > > Here's an example:
> > > > bob <- seq(1:100)
> > > > bob2 <- rgamma(100, 2, 1)*10+bob
> > > > model<-glm(bob2 ~ bob, family=Gamma)
> > > >
> > > > Then enter:
> > > > plot(model, which=c(3))
> > > > to get the Scale-Location graph
> > > >
> > > > Then compare it to the Scale-Location graph when you run the
> > following
> > > > command and page through to the 3rd graph:
> > > > plot(model)
> > > >
> > > > When I do this, I get different results -- some of the high
> values
> > are
> > > > different on each plot.  On my real data the difference is more
> > severe
> > > > than
> > > > in this randomly generated example.  I'd be happy to supply my
> real
> > > > data and
> > > > R code if this smaller example isn't sufficient.  Thank you for
> any
> > > > help!!
> > > >
> > > >
> > > > On Wed, Nov 12, 2008 at 9:43 AM, Prof Brian Ripley
> > > > <ripley at stats.ox.ac.uk>wrote:
> > > >
> > > > > Instead of re-posting the same message, please study the
> posting
> > > > guide and
> > > > > supply the information asked for, including a reproducible
> > example.
> > > > There is
> > > > > no way we can help you unless you help us to help you.
> > > > >
> > > > >
> > > > > On Wed, 12 Nov 2008, Effie Greathouse wrote:
> > > > >
> > > > >   I am running GLM models using the gamma family.  For example:
> > > > >> model <-glm(y ~ x, family=Gamma(link="identity"))
> > > > >>
> > > > >> I am getting different results for the normal Q-Q plot and the
> > > > >> Scale-Location plot if I run the diagnostic plots without
> > specifying
> > > > the
> > > > >> plot vs. if I specify the plot ... e.g., "plot(model)" gives
> me
> > a
> > > > >> different
> > > > >> Normal Q-Q graph than "plot(model, which=c(2))".  The former
> > gives
> > > > data
> > > > >> points distributed in a quadratic pattern, while the latter
> > gives
> > > > data
> > > > >> points more or less along the 1:1 line.  Shouldn't these two
> > > > commands be
> > > > >> giving me the same exact graphs?  I have read the
> documentation
> > on
> > > > plot.lm
> > > > >> and searched the help archives, but I am still learning GLM's
> > and
> > > > I'm not
> > > > >> very familiar with understanding diagnostic plots for GLM's,
> so
> > any
> > > > help
> > > > >> would be much appreciated!
> > > > >>
> > > > >>        [[alternative HTML version deleted]]
> > > > >>
> > > > >> ______________________________________________
> > > > >> R-help at r-project.org mailing list
> > > > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > > > >> PLEASE do read the posting guide
> > > > >> http://www.R-project.org/posting-guide.html<http://www.r-
> > project.org/posting-guide.html>
> > > <http://www.r-
> > >  > project.org/posting-guide.html>
> > > > >> and provide commented, minimal, self-contained, reproducible
> > code.
> > > > >>
> > > > >
> > > > > --
> > > > > Brian D. Ripley,                  ripley at stats.ox.ac.uk
> > > > > Professor of Applied Statistics,
> > http://www.stats.ox.ac.uk/~ripley/
> > > > > University of Oxford,             Tel:  +44 1865 272861 (self)
> > > > > 1 South Parks Road,                     +44 1865 272866 (PA)
> > > > > Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> > > > >
> > > >
> > > >         [[alternative HTML version deleted]]
> > > >
> > > > ______________________________________________
> > > > R-help at r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide http://www.R-
> project.org/posting-
> > <http://www.r-project.org/posting->
> > > > guide.html
> > > > and provide commented, minimal, self-contained, reproducible
> code.
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list