[Rd] Enhanced version of plot.lm()

Martin Maechler maechler at stat.math.ethz.ch
Wed Apr 27 17:30:06 CEST 2005


>>>>> "PD" == Peter Dalgaard <p.dalgaard at biostat.ku.dk>
>>>>>     on 27 Apr 2005 16:54:02 +0200 writes:

    PD> Martin Maechler <maechler at stat.math.ethz.ch> writes:
    >> I'm about to commit the current proposal(s) to R-devel,
    >> **INCLUDING** changing the default from 
    >> 'which = 1:4' to 'which = c(1:3,5)
    >> 
    >> and ellicit feedback starting from there.
    >> 
    >> One thing I think I would like is to use color for the Cook's
    >> contours in the new 4th plot.

    PD> Hmm. First try running example(plot.lm) with the modified function and
    PD> tell me which observation has the largest Cook's D. With the suggested
    PD> new 4th plot it is very hard to tell whether obs #49 is potentially or
    PD> actually influential. Plots #1 and #3 are very close to conveying the
    PD> same information though...

I shouldn't be teaching here, and I know that I'm getting into fighted
territory (regression diagnostics; robustness; "The" Truth, etc,etc)
but I believe there is no unique way to define "actually influential"
(hence I don't believe that it's extremely useful to know
exactly which Cook's D is largest).

Partly because there are many statistics that can be derived from a
multiple regression fit all of which are influenced in some way. 
AFAIK, all observation-influence measures g(i) are functions of
(r_i, h_{ii}) and the latter are the quantities that "regression
users" should really know {without consulting a text book} and
that are generalizable {e.g. to "linear smoothers" such as
gam()s (for "non-estimated" smoothing parameter)}.

Martin



More information about the R-devel mailing list