[R] different results with plot.lm vs. plot.lm(which=c(2))

Greg Snow Greg.Snow at imail.org
Wed Nov 12 21:07:23 CET 2008


The same thing is affecting plot 2 as well.  Basically in the code there is line early on:

r <- residuals(x)

which gets the residuals and by default for glm models those are the deviance residuals.

A bit latter is the code (after some optional modifications to r):

    if (any(show[2:3])) {
        ylab23 <- if (isGlm)
            "Std. deviance resid."
        else "Standardized residuals"
        r.w <- if (is.null(w))
            r
        else sqrt(w) * r
    }

Which results in r.w being the residuals (or resids times the square root of w) if either plot 2 or 3 is requested.

The next batch of code is:

    if (show[5]) {
        ylab5 <- if (isGlm)
            "Std. Pearson resid."
        else "Standardized residuals"
        r.w <- residuals(x, "pearson")
        if (!is.null(w))
            r.w <- r.w[wind]
    }

Which changes r.w to the pearson residuls if plot 5 is requested, it also sets the label to use for plot 5, but does not change the label (ylab23) to be used for plots 2 and 3 and so they still are labeled as deviance residuals.

The r.w variable is used latter in the code for plots 2, 3, and 5 (and maybe others).  Unless there is something else in between the above code and where r.w is used that fixes this (I did not see anything, but did not do more than skim the code) then there is a clear bug in that the residuals being used is inconsistent and mis-labeled in some cases.

If anyone on R-core is paying attention to this thread, then the problem may already be in the process of being fixed (or it may already be fixed, check r-patched, the above is based on R 2.8.0 (2008-10-20)), I've had a couple of cases where the fix was in place before I could submit a bug report.  If we don't see any indication of it being fixed in the next couple of days, then one of us should submit a bug report.


--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Effie Greathouse
> Sent: Wednesday, November 12, 2008 12:15 PM
> To: r-help at r-project.org
> Subject: Re: [R] different results with plot.lm vs. plot.lm(which=c(2))
>
> There's also a big difference in plot 2 (Normal Q-Q) in my real data,
> but I
> don't see a real difference in plot 2 for the example I sent, except
> that
> some of the outlier labels are different.  Would the residuals being
> plotted
> likely be the cause of the difference in plot 2 as well?  Dr. Snow,
> would
> you want to be able to see the plot 2 graphs for my real data?  I don't
> know
> how to save plot 2 when I'm clicking through the plot(model) graphs, so
> I
> can't just send it to you.  Thanks for checking this out Dr. Snow!
>
> On Wed, Nov 12, 2008 at 11:05 AM, Greg Snow <Greg.Snow at imail.org>
> wrote:
>
> > From a quick look at the code it looks like when you ask for plot
> number 5
> > (included in default when 'which' is not specified), then the
> deviance
> > residuals are replaced by the pearson residuals to be used in later
> > computations.  So the difference that you are seeing is that one of
> the
> > plots is based on deviance residuals and the other on pearson
> residuls.
> >
> > It seems that there is a bug here in that, at a minimum, the label
> should
> > be changed to indicate which residuals were actually used, or the
> code
> > changed to continue to use the deviance residuals for plot 3 even
> when plot
> > 5 is requested.
> >
> > Does anyone else see something that I missed in how the residuals are
> > replaced and used?
> >
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > Statistical Data Center
> > Intermountain Healthcare
> > greg.snow at imail.org
> > 801.408.8111
> >
> >
> > > -----Original Message-----
> > > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> > > project.org] On Behalf Of Effie Greathouse
> > > Sent: Wednesday, November 12, 2008 11:16 AM
> > > To: r-help at r-project.org
> > > Subject: Re: [R] different results with plot.lm vs.
> plot.lm(which=c(2))
> > >
> > > Hi Dr. Ripley--Sorry for the repost everybody.  The original
> message I
> > > sent
> > > never showed up in my inbox, so I thought it didn't get sent to the
> > > list.
> > >
> > > I'm running R 2.8.0, installed from a pre-compiled version, on
> Windows
> > > XP.
> > > When I type Sys.getlocale() at the R prompt, it returns:
> > >  "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> > > States.1252;LC_MONETARY=English_United
> > > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
> > >
> > > Here's an example:
> > > bob <- seq(1:100)
> > > bob2 <- rgamma(100, 2, 1)*10+bob
> > > model<-glm(bob2 ~ bob, family=Gamma)
> > >
> > > Then enter:
> > > plot(model, which=c(3))
> > > to get the Scale-Location graph
> > >
> > > Then compare it to the Scale-Location graph when you run the
> following
> > > command and page through to the 3rd graph:
> > > plot(model)
> > >
> > > When I do this, I get different results -- some of the high values
> are
> > > different on each plot.  On my real data the difference is more
> severe
> > > than
> > > in this randomly generated example.  I'd be happy to supply my real
> > > data and
> > > R code if this smaller example isn't sufficient.  Thank you for any
> > > help!!
> > >
> > >
> > > On Wed, Nov 12, 2008 at 9:43 AM, Prof Brian Ripley
> > > <ripley at stats.ox.ac.uk>wrote:
> > >
> > > > Instead of re-posting the same message, please study the posting
> > > guide and
> > > > supply the information asked for, including a reproducible
> example.
> > > There is
> > > > no way we can help you unless you help us to help you.
> > > >
> > > >
> > > > On Wed, 12 Nov 2008, Effie Greathouse wrote:
> > > >
> > > >   I am running GLM models using the gamma family.  For example:
> > > >> model <-glm(y ~ x, family=Gamma(link="identity"))
> > > >>
> > > >> I am getting different results for the normal Q-Q plot and the
> > > >> Scale-Location plot if I run the diagnostic plots without
> specifying
> > > the
> > > >> plot vs. if I specify the plot ... e.g., "plot(model)" gives me
> a
> > > >> different
> > > >> Normal Q-Q graph than "plot(model, which=c(2))".  The former
> gives
> > > data
> > > >> points distributed in a quadratic pattern, while the latter
> gives
> > > data
> > > >> points more or less along the 1:1 line.  Shouldn't these two
> > > commands be
> > > >> giving me the same exact graphs?  I have read the documentation
> on
> > > plot.lm
> > > >> and searched the help archives, but I am still learning GLM's
> and
> > > I'm not
> > > >> very familiar with understanding diagnostic plots for GLM's, so
> any
> > > help
> > > >> would be much appreciated!
> > > >>
> > > >>        [[alternative HTML version deleted]]
> > > >>
> > > >> ______________________________________________
> > > >> R-help at r-project.org mailing list
> > > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > > >> PLEASE do read the posting guide
> > > >> http://www.R-project.org/posting-guide.html<http://www.r-
> project.org/posting-guide.html>
> > <http://www.r-
> >  > project.org/posting-guide.html>
> > > >> and provide commented, minimal, self-contained, reproducible
> code.
> > > >>
> > > >
> > > > --
> > > > Brian D. Ripley,                  ripley at stats.ox.ac.uk
> > > > Professor of Applied Statistics,
> http://www.stats.ox.ac.uk/~ripley/
> > > > University of Oxford,             Tel:  +44 1865 272861 (self)
> > > > 1 South Parks Road,                     +44 1865 272866 (PA)
> > > > Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> > > >
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-
> <http://www.r-project.org/posting->
> > > guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list