[Rd] plot(<lm>): new behavior in R-2.2.0 alpha
Martin Maechler
maechler at stat.math.ethz.ch
Wed Sep 14 10:55:48 CEST 2005
Thank you, John, for
Dear
>>>>> "JohnF" == John Fox <jfox at mcmaster.ca>
>>>>> on Tue, 13 Sep 2005 16:41:28 -0400 writes:
JohnF> A couple of comments on the new plots (numbers 5 and 6):
JohnF> Perhaps some more thought could be given to the
JohnF> plotted contours for Cook's D (which are 0.5 and 1.0
JohnF> in the example -- large Cook's Ds). A rule-of-thumb
JohnF> cut-off for this example is 4/(n - p) = 4/(50 - 5) =
JohnF> 0.089, and the discrepancy will grow with n.
That's an interesting suggestion. Where does the 4/(n-p) come
from? or put differently, should I better read in of your books? ;-)
Honestly, I'm so much a fan of R_i / h_ii that I didn't even
know that.
JohnF> I'm not terribly fond of number 6, since it seems
JohnF> natural to me to think of the relationship among
JohnF> these quantities as influence on coefficients =
JohnF> leverage * outlyingness (which corresponds to 5);
JohnF> also note how in the example, the labels for large
JohnF> residuals overplot.
I think John mainly proposed '6' because other proposed it as another
good alternative. From the few examples I've looked at, I
haven't got fond at all either.
JohnF> Finally, your remarks about balanced data are cogent
JohnF> and suggest going with 1:3 in this case (since R_i
JohnF> vs. i is pretty redundant with the QQ plot).
Ah, that's another, maybe better alternative to my proposal.
One drawback of it is for situations where people do something
like par(mfrow=c(2,2))
before calling plot(<lm>) for several fitted lm models,
assuming to fill one page for each of the plots.
and I think that's something I would have done always in such
situations where several different models are fitted and compared.
Maybe plot.lm() should "advance an empty frame" as soon as
prod(par("mfrow")) >= 4
in that case?
Martin
JohnF> --------------------------------
JohnF> John Fox
JohnF> Department of Sociology
JohnF> McMaster University
JohnF> Hamilton, Ontario
JohnF> Canada L8S 4M4
JohnF> 905-525-9140x23604
JohnF> http://socserv.mcmaster.ca/jfox
JohnF> --------------------------------
>> -----Original Message-----
>> From: Martin Maechler [mailto:maechler at stat.math.ethz.ch]
>> Sent: Tuesday, September 13, 2005 9:18 AM
>> To: R-devel at stat.math.ethz.ch
>> Cc: John Maindonald; Werner Stahel; John Fox
>> Subject: plot(<lm>): new behavior in R-2.2.0 alpha
>>
>> As some of you R-devel readers may know, the plot() method
>> for "lm" objects is based in large parts on contributions by
>> John Maindonald, subsequently "massaged" by me and other
>> R-core members.
>>
>> In the statistics litterature on applied regression, people
>> have had diverse oppinions on what (and how many!) plots
>> should be used for goodness-of-fit / residual diagnostics,
>> and to my knowledge most people have agreed to want to see
>> one (or more) version of a Tukey-Anscombe plot {Residuals ~
>> Fitted} and a QQ normal plot.
>> Another consideration was to be somewhat close to what S
>> (S-plus) was doing. So we have two versions of residuals vs
>> fitted, one for checking E[error] = 0, the other for
>> checking Var[error] = constant. So we got to the first three plots of
>> plot.lm() about which I don't want to debate at the moment
>> {though, there's room for improvement even there: e.g., I
>> know of at least one case where plot(<lm>) wasn't used
>> because the user was missing the qqline() she was so used to
>> in the QQ plot}
>>
>> The topic of this e-mail is the (default) 4th plot which I
>> had changed; really prompted by the following:
>> More than three months ago, John wrote
>> http://tolstoy.newcastle.edu.au/R/devel/05/04/0594.html
>> (which became a thread of about 20 messages, from Apr.23
>> -- 29, 2005)
>>
>> and currently,
>> NEWS for R 2.2.0 alpha contains
>>
>> >> USER-VISIBLE CHANGES
>> >>
>> >> o plot(<lm object>) uses a new default for the fourth panel when
>> >> 'which' is not specified.
>> >> ___ may change before release ___
>>
>> and the header is
>>
>> plot.lm <-
>> function (x, which = c(1:3, 5),
>> caption = c("Residuals vs Fitted",
>> "Normal Q-Q", "Scale-Location",
>> "Cook's distance", "Residuals vs Leverage",
>> "Cook's distance vs Leverage"),
>> ......... ) {..............}
>>
>> So we now have 6 possible plots, where 1,2,3 and 5 are the
>> defaults (and 1,2,3,4 where the old defaults).
>>
>> For the influential points and combination of 'influential'
>> and 'outlier'
>> there have been quite a few more proposals in the past. R <=
>> 2.1.x has been plotting the Cook's distances vs. observation
>> number, whereas quite a few people in the past have noted
>> that all influence measures being more or less complicated
>> functions of residuals and "hat values" aka "leverages",
>> (R_i, h_{ii}), it would really make sense and fit more to the
>> other plots to plot residuals vs. Leverages --- with the
>> additional idea of adding *contours* of (equal) Cook's
>> distances to that plot, in case one would really want to seem them.
>>
>> In the mean time, this has been *active* in R-devel for quite
>> a while, and we haven't received any new comments.
>>
>> One remaining problem I'd like to address is the "balanced AOV"
>> situation, something probably pretty rare nowadays in real
>> practice, but common of course in teaching ANOVA.
>> As you may remember, in a balanced design, all observations
>> have the same leverages h_{ii}, and the plot R_i vs h_ii
>> is really not so useful. In that case, the cook distances
>> CD_i = c * R_i ^2 and so CD_i vs i {the old "4-th plot in
>> plot.lm"} is
>> graphically identical to R_i^2 vs i.
>> Now in that case (of identical h_ii's), I think one would
>> really want "R_i vs i".
>>
>> Question to the interested parties:
>>
>> Should there be an automatism
>> ``when h_ii == const'' {"==" with a bit of numerical fuzz}
>> plot a) R_i vs i
>> or b) CD_i vs i
>>
>> or should users have to manually use
>> plot(<lm>, which=1:4, ...)
>> in such a case?
>>
>> Feedback very welcome,
>> particularly, you first look at the examples in help(plot.lm)
>> in *R-devel* aka R-2.2.0 alpha.
>>
>> Martin Maechler, ETH Zurich
More information about the R-devel
mailing list