[R] influence measures for multivariate linear models
Peter Dalgaard
pdalgd at gmail.com
Tue Aug 10 18:20:14 CEST 2010
Michael Friendly wrote:
> Barrett & Ling, JASA, 1992, v.87(417), pp184-191 define general classes
> of influence measures for multivariate
> regression models, including analogs of Cook's D, Andrews & Pregibon
> COVRATIO, etc. As in univariate
> response models, these are based on leverage and residuals based on
> omitting one (or more) observations at
> a time and refitting, although, in the univariate case, the computations
> can be optimized, as they are in
> stats::influence() and related methods.
>
> I'm interested in exploring the multivariate extension in R. I tried
> the following, and was surprised to find that
> R returned a result rather than an error -- presumably because mlm
> objects are not trapped before they
> get to lm.influence()
>
> > # multivariate model
> > data(Rohwer, package="heplots")
> > rohwer.mod <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss,
> data=Rohwer)
>
> > names(influence(rohwer.mod))
> [1] "hat" "coefficients" "sigma" "wt.res"
> > head(influence(rohwer.mod)$coefficients, 2)
> [,1] [,2] [,3] [,4] [,5] [,6]
> [1,] 2.25039 0.0254739 -0.025252 -0.06297 -0.121507 0.094355
> [2,] 0.84649 -0.0062656 -0.077430 0.08345 -0.022579 -0.059480
> >
>
> Of course, the correct calculations would result from refitting,
> omitting each observation in turn, though doing this
> directly would be horribly inefficient.
> e.g, calculating B(i), deleting case i:
>
> > coef(update(rohwer.mod, subset=1:69 !=1, data=Rohwer))
> SAT PPVT Raven
> (Intercept) -2.466079 35.68664 11.510068
> n 1.888286 0.60949 0.075931
> s -0.034524 -0.53040 0.160328
> ns -2.739834 -0.67355 0.066392
> na 2.219340 1.20481 -0.037272
> ss 1.072300 0.99033 0.058509
> > coef(update(rohwer.mod, subset=1:69 !=2, data=Rohwer))
> SAT PPVT Raven
> (Intercept) -1.062178 33.88199 10.8988006
> n 1.920026 0.59735 0.0713976
> s 0.017654 -0.47464 0.1774135
> ns -2.886254 -0.67905 0.0673686
> na 2.120411 1.29016 -0.0077484
> ss 1.226135 0.96430 0.0471764
>
> Is there anything existing for this case that I've missed, or does
> anyone have an interest in pursuing this topic?
Hmm, fitted coefficients in this sort multivariate models are the same
as those in the univariate ones, so as long as you do whole-case
deletions, I would think that you should be able to reuse the 1D code. I
would conjecture that the main problem with what you currently get is
that it only pertains to the 1st column -- looks like the differences
between the two rows from lm.influence matches the differences between
the first two colums from coef(update(...)).
Since lm() only handles complete cases, casewise deletion diagnostics is
probably the best you can get, otherwise it would be interesting to see
the effect of deleting each coordinate separately.
(As you know, these matters are within my general sphere of interest,
but I'm afraid my time is too constrained at them moment for more than a
sideline view.)
>
> -Michael
>
--
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-help
mailing list