[Rd] Enhanced version of plot.lm()

John Fox jfox at mcmaster.ca
Fri Apr 29 02:08:43 CEST 2005


Dear John


> -----Original Message-----
> From: John Maindonald [mailto:john.maindonald at anu.edu.au] 
> Sent: Thursday, April 28, 2005 6:47 PM
> To: John Fox
> Cc: 'Werner Stahel'; 'Peter Dalgaard'; 
> <r-devel at stat.math.ethz.ch>; 'David Firth'; 'Martin Maechler'
> Subject: Re: [Rd] Enhanced version of plot.lm()
> 
> NB also the mention of a possible addition to stats: vif()
> 
> Dear John -
> I think users can cope with six plots offered by one 
> function, with four of them given by default, and the two 
> remaining plots alternative ways of presenting the 
> information in the final default plot.  The idea of plot.lm() 
> was to provide a set of plots that would serve most basic purposes.
> 

I rather like added-variable plots for examining influence and leverage on
coefficients.

> It may be reasonable to have a suite of plots for examining 
> residuals and influence.  I'd suggest trying to follow the 
> syntax and labeling conventions as for plot.lm(), unless 
> these seem inappropriate.
> 

I don't have strong feelings about this -- I certainly don't think that the
suggestion is inappropriate.

> While on such matters, there is a function vif() in DAAG, and 
> a more comprehensive function vif() in car.  One of these, 
> probably yours if you are willing, should go into stats.  

I'd have no objection to that.

> There's one addition that I'd make; allow a model matrix as 
> parameter, as an optional alternative to giving the model object.

That seems reasonable -- for linear models, anyway. The current approach
works (at least arguably) for generalized linear models as well. My only
hesitation is that having just the model matrix doesn't insure that the
model is a linear model. With this caveat, I should be able to handle model
matrices by adding a matrix method to vif (and perhaps printing a warning).
I'll probably do that when I next revise the car package.

Thanks for the suggestion.
 John

> Regards
> John M.
> 
> On 28 Apr 2005, at 10:39 PM, John Fox wrote:
> 
> > Dear John et al.,
> >
> > Curiously, Georges Monette (at York University in Toronto) 
> and I were 
> > just talking last week about influence-statistic contours, 
> and I wrote 
> > a couple of functions to show these for Cook's D and for 
> covratio as 
> > functions of hat-values and studentized residuals. These 
> differ a bit 
> > from the ones previously discussed here in that they show 
> > rule-of-thumb cut-offs for D and covratio, along with Bonferroni 
> > critical values for studentized residuals.
> >
> > I've attached a file with these functions, even though they're not 
> > that polished.
> >
> > More generally, I wonder whether it's not best to supply plots like 
> > these as separate functions rather than as a do-it-all plot 
> method for 
> > lm objects.
> >
> > Regards,
> >  John
> >
> > --------------------------------
> > John Fox
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario
> > Canada L8S 4M4
> > 905-525-9140x23604
> > http://socserv.mcmaster.ca/jfox
> > --------------------------------
> >
> >> -----Original Message-----
> >> From: r-devel-bounces at stat.math.ethz.ch 
> >> [mailto:r-devel-bounces at stat.math.ethz.ch] On Behalf Of John 
> >> Maindonald
> >> Sent: Wednesday, April 27, 2005 7:54 PM
> >> To: Martin Maechler
> >> Cc: David Firth; Werner Stahel; r-devel at stat.math.ethz.ch; Peter 
> >> Dalgaard
> >> Subject: Re: [Rd] Enhanced version of plot.lm()
> >>
> >>
> >> On 28 Apr 2005, at 1:30 AM, Martin Maechler wrote:
> >>
> >>>>>>>> "PD" == Peter Dalgaard <p.dalgaard at biostat.ku.dk>
> >>>>>>>>     on 27 Apr 2005 16:54:02 +0200 writes:
> >>>
> >>>     PD> Martin Maechler <maechler at stat.math.ethz.ch> writes:
> >>>>> I'm about to commit the current proposal(s) to R-devel,
> >>>>> **INCLUDING** changing the default from 'which = 1:4' 
> to 'which =
> >>>>> c(1:3,5)
> >>>>>
> >>>>> and ellicit feedback starting from there.
> >>>>>
> >>>>> One thing I think I would like is to use color for the Cook's 
> >>>>> contours in the new 4th plot.
> >>>
> >>>     PD> Hmm. First try running example(plot.lm) with the modified 
> >>> function and
> >>>     PD> tell me which observation has the largest Cook's D.
> >> With the
> >>> suggested
> >>>     PD> new 4th plot it is very hard to tell whether obs #49 is 
> >>> potentially or
> >>>     PD> actually influential. Plots #1 and #3 are very close to 
> >>> conveying the
> >>>     PD> same information though...
> >>>
> >>> I shouldn't be teaching here, and I know that I'm getting
> >> into fighted
> >>> territory (regression diagnostics; robustness; "The" Truth,
> >> etc,etc)
> >>> but I believe there is no unique way to define "actually
> >> influential"
> >>> (hence I don't believe that it's extremely useful to know exactly 
> >>> which Cook's D is largest).
> >>>
> >>> Partly because there are many statistics that can be 
> derived from a 
> >>> multiple regression fit all of which are influenced in some way.
> >>> AFAIK, all observation-influence measures g(i) are
> >> functions of (r_i,
> >>> h_{ii}) and the latter are the quantities that "regression users"
> >>> should really know {without consulting a text book} and that are 
> >>> generalizable {e.g. to "linear smoothers" such as gam()s (for 
> >>> "non-estimated" smoothing parameter)}.
> >>>
> >>> Martin
> >>
> >> I agree with Martin.  I like the idea of using color 
> (red?) for the 
> >> new Cook's contours.  People who want (fairly) precise 
> comparisons of 
> >> the Cook's statistics can still use the present plot #4, 
> perhaps as a 
> >> follow-up to the new plot #5.
> >> It would be possible to label the Cookwise most extreme 
> points with 
> >> the actual values (to perhaps 2sig figures, i.e., labeling on both 
> >> sides of such points), but this would add what I consider is 
> >> unnecessary clutter to the graph.
> >>
> >> John.
> >>
> >> John Maindonald             email: john.maindonald at anu.edu.au
> >> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
> >> Centre for Bioinformation Science, Room 1194, John Dedman 
> >> Mathematical Sciences Building (Building 27) Australian National 
> >> University, Canberra ACT 0200.
> >>
> >> ______________________________________________
> >> R-devel at stat.math.ethz.ch mailing list 
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> > <influence-plots.R>
> John Maindonald             email: john.maindonald at anu.edu.au
> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
> Centre for Bioinformation Science, Room 1194, John Dedman 
> Mathematical Sciences Building (Building 27) Australian 
> National University, Canberra ACT 0200.
>



More information about the R-devel mailing list