[R] influence measures for multivariate linear models
Michael Friendly
friendly at yorku.ca
Wed Sep 15 00:10:55 CEST 2010
I'm following up on a question I posted 8/10/2010, but my newsreader
has lost this thread.
> Barrett & Ling, JASA, 1992, v.87(417), pp184-191 define general
> classes of influence measures for multivariate
> regression models, including analogs of Cook's D, Andrews & Pregibon
> COVRATIO, etc. As in univariate
> response models, these are based on leverage and residuals based on
> omitting one (or more) observations at
> a time and refitting, although, in the univariate case, the
> computations can be optimized, as they are in
> stats::influence() and related methods.
>
> I'm interested in exploring the multivariate extension in R. I tried
> the following, and was surprised to find that
> R returned a result rather than an error -- presumably because mlm
> objects are not trapped before they
> get to lm.influence()
>
I've done a bit more testing, comparing what I get from R lm functions
with some direct calculations in both
R and SAS, and now conclude that there are bugs in lm.influence and the
coef(update()) methods I tried
before to get leave-one-out coefficients for multivariate response
models fit with lm().
By bugs, I mean that results returned are silently wrong, or at best,
misleading.
My test case: a subset of the Rohwer data anayzed by Barrett & Ling (&
others)
> library(heplots)
> data(Rohwer, package="heplots")
> Rohwer1 <- Rohwer[Rohwer$SES=='Hi',]
> rohwer.mod <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss,
data=Rohwer1)
> (B <- coef(rohwer.mod))
SAT PPVT Raven
(Intercept) -28.467471 39.6970896 13.2438356
n 3.257132 0.0672825 0.0593474
s 2.996583 0.3699843 0.4924440
ns -5.859063 -0.3743802 -0.1640219
na 5.666223 1.5230090 0.1189805
ss -0.622653 0.4101567 -0.1211564
Attempt to get the leave-one-out coefficients, B(-i), for the first 2
cases from influence(): *wrong* -- influence() should
probably issue
stop('Not defined for mlm objects'), or the documentation should be made
clearer for what happens in this
case.
> head(influence(rohwer.mod, do.coef=TRUE)$coefficients, 2)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] -5.56830 0.0510135 -0.486096 0.61811 0.0585384 -0.1339219
[2,] 2.30754 0.1092886 0.388349 -0.40717 -0.0600899 0.0477736
The correct result, for the first case is
> # direct calculation
> X <- cbind(1, as.matrix(Rohwer1[,6:10]))
> Y <- as.matrix(Rohwer1[,3:5])
> keep <- 2:n
> (B1 <- solve(t(X[keep,]) %*% X[keep,]) %*% t(X[keep,]) %*% Y[keep,])
SAT PPVT Raven
-22.899170 41.7757793 13.5057156
n 3.206119 0.0482388 0.0569482
s 3.482679 0.5514477 0.5153053
ns -6.477172 -0.6051252 -0.1930919
na 5.607684 1.5011562 0.1162274
ss -0.488731 0.4601508 -0.1148580
OK, ?influence tells me that the 'coefficients' returned are actually
B-B(-i), and I can see that
it gives those for the first response variable (SAT)
> B-B1
SAT PPVT Raven
(Intercept) -5.5683009 -2.0786897 -0.26187994
n 0.0510135 0.0190437 0.00239919
s -0.4860961 -0.1814634 -0.02286134
ns 0.6181096 0.2307451 0.02907000
na 0.0585384 0.0218528 0.00275309
ss -0.1339219 -0.0499941 -0.00629841
For SAT, same as re-fitting a univariate model:
> rohwer.inf1 <- influence(lm(SAT ~ n + s + ns + na + ss,
data=Rohwer1), do.coef=TRUE)
> colnames(rohwer.inf1$coefficients) <- paste("SAT",
colnames(rohwer.inf1$coefficients), sep=":")
> head(rohwer.inf1$coefficients, 2)
SAT:(Intercept) SAT:n SAT:s SAT:ns SAT:na SAT:ss
38 -5.56830 0.0510135 -0.486096 0.61811 0.0585384 -0.1339219
39 2.30754 0.1092886 0.388349 -0.40717 -0.0600899 0.0477736
>
But I'm looking for a method I can generalize. I also tried the
following, using subset= in
update() and lm(), giving a different wrong answer. (Am I using the
subset= argument
correctly?)
> coef(update(rohwer.mod, subset=1:n !=1, data=Rohwer1))
SAT PPVT Raven
(Intercept) -28.428163 51.762326 11.0052852
n 0.912497 -0.316177 0.1294643
s 5.478358 0.280319 0.6639180
ns -6.119679 -1.828516 -0.2995350
na 5.383479 2.096793 0.1940462
ss -0.345764 0.448199 -0.0725977
Warning message:
In 1:n : numerical expression has 32 elements: only the first used
So, how can I calculate leave-one-out coefficients (and other
quantities) efficiently
for mlm-s?
-Michael
--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street Web: http://www.datavis.ca
Toronto, ONT M3J 1P3 CANADA
More information about the R-help
mailing list