[R] refitting lm() with same x, different y
Douglas Bates
bates at stat.wisc.edu
Tue Apr 19 15:51:32 CEST 2005
William Valdar wrote:
>
>> From: Brian Ripley
>>
>> As a first shot, use lm with a matrix response. That fits them all at
>> once with one QR-decomposition. No analogue for glm or lmer, though,
>> since for those the iterative fits run do depend on the response.
>
>
> Thanks Brian, that's very helpful. Also thanks to Kevin Wright who
> suggested using lsfit(x,Y) as being faster than lm for a Y matrix.
>
> I've since worked out that I can bypass even more lm machinery by basing
> my permutation test significance thresholds on the RSS from qr.resid().
> Since,
>
> y = QRb + e
> Q'y = Rb + Q'e
> RSS = || Q'y - Rb ||
>
> then I can do
>
> X.qr <- qr(X)
>
> once, and for every instance of y calculate
>
> e <- qr.resid(X.qr, y)
> rss <- e %*% e
>
> recording them in
>
> rss.for.all.fits[i] <- rss
>
> which gives me an empirical distribution of RSS scores. The degrees of
> freedom in my X matrix are constant throughout (I should have said that
> before), so all RSS's are on a level footing and map trivially to the
> p-value. I can therefore take the RSS at, say, the 5th percentile, turn
> it into a p-value and report that as my 5% threshold.
Actually you do not need to calculate the residuals to be able to
calculate RSS. If you write Q = [Q1 Q2] where Q1 is the first p columns
and Q2 is the remaining n - p columns and R1 for the first p rows of R
then your expression for RSS can be extended as
RSS = || Q'y - Rb || = || Q1'y - R1 b || + || Q2'y ||
because the last n - p rows of R are zero. At the least squares value
of b the first term is zero when X has full column rank. Thus
rss <- sum(qr.qty(X.qr, y)[-(1:p)]^2)
qr.qty should be slightly faster than qr.resid because it performs only
one (virtual) multiplication by Q or Q'. I doubt that the difference
would be noticeable in practice.
More information about the R-help
mailing list