[Rd] Enhanced version of plot.lm()
Martin Maechler
maechler at stat.math.ethz.ch
Wed Apr 27 17:30:06 CEST 2005
>>>>> "PD" == Peter Dalgaard <p.dalgaard at biostat.ku.dk>
>>>>> on 27 Apr 2005 16:54:02 +0200 writes:
PD> Martin Maechler <maechler at stat.math.ethz.ch> writes:
>> I'm about to commit the current proposal(s) to R-devel,
>> **INCLUDING** changing the default from
>> 'which = 1:4' to 'which = c(1:3,5)
>>
>> and ellicit feedback starting from there.
>>
>> One thing I think I would like is to use color for the Cook's
>> contours in the new 4th plot.
PD> Hmm. First try running example(plot.lm) with the modified function and
PD> tell me which observation has the largest Cook's D. With the suggested
PD> new 4th plot it is very hard to tell whether obs #49 is potentially or
PD> actually influential. Plots #1 and #3 are very close to conveying the
PD> same information though...
I shouldn't be teaching here, and I know that I'm getting into fighted
territory (regression diagnostics; robustness; "The" Truth, etc,etc)
but I believe there is no unique way to define "actually influential"
(hence I don't believe that it's extremely useful to know
exactly which Cook's D is largest).
Partly because there are many statistics that can be derived from a
multiple regression fit all of which are influenced in some way.
AFAIK, all observation-influence measures g(i) are functions of
(r_i, h_{ii}) and the latter are the quantities that "regression
users" should really know {without consulting a text book} and
that are generalizable {e.g. to "linear smoothers" such as
gam()s (for "non-estimated" smoothing parameter)}.
Martin
More information about the R-devel
mailing list