[R] functions to calculate t-stats, etc. for lm.fit objects?
Marc Schwartz
marc_schwartz at me.com
Wed Jul 8 16:50:56 CEST 2009
On Jul 8, 2009, at 8:51 AM, Whit Armstrong wrote:
> I'm running a huge number of regressions in a loop, so I tried lm.fit
> for a speedup. However, I would like to be able to calculate the
> t-stats for the coefficients.
>
> Does anyone have some functions for calculating the regression summary
> stats of an lm.fit object?
>
> Thanks,
> Whit
Whit, depending upon just how much time savings you are realizing by
using lm.fit() and not lm(), the approach to your question may vary.
Do you need all of the models, or only a subset?
If the latter, then I would narrow down your model set and re-run them
with lm() so that you can use summary.lm() directly. That would entail
less custom coding, which may otherwise offset any time savings from
using lm.fit()
If the former, then there are two choices as I see them.
The first would be to restructure the object resulting from lm.fit()
by adding the elements required to run summary.lm(). However, I would
think that this overhead would bring you back to a point where just
using lm() would be a better approach from a time standpoint.
The second would be to cook up a function that only provides the
subset of results that you need from summary.lm() and then use that on
the results of lm.fit(). Here again, there remains the question of
just how much time are you saving using lm.fit() versus the additional
overhead of calculating even a subset of the output.
Here is a very simple approach to a function that would get you a
subset of the output that you would get using, for example,
coef(summary(lm.object)). This is using a selective approach of
copying and slightly editing code from summary.lm(). Note that there
is other code in summary.lm() to handle weights and such, if your
models are more complex. You would need to add that in if that is the
case.
If you need much more summary output than this on each model, then I
think you would be better off just using lm() and summary.lm().
# Use at your own risk...untested on more complex models :-)
# 'x' is an lm.fit object
calc.lm.t <- function(x)
{
Qr <- x$qr
r <- x$residuals
p <- x$rank
p1 <- 1L:p
rss <- sum(r^2)
n <- NROW(Qr$qr)
rdf <- n - p
resvar <- rss/rdf
R <- chol2inv(Qr$qr[p1, p1, drop = FALSE])
se <- sqrt(diag(R) * resvar)
est <- x$coefficients[Qr$pivot[p1]]
tval <- est/se
res <- cbind(est = est, se = se, tval = tval)
res
}
Here is some simple example data:
set.seed(1)
y <- rnorm(100)
x <- rnorm(100)
# Get the default coefficient output using summary.lm()
> coef(summary(lm(y ~ x)))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1088521158 0.09034800 1.20480938 0.2311784
x -0.0009323697 0.09472155 -0.00984327 0.9921663
# Now use calc.lm.t
lmf <- lm.fit(model.matrix(y ~ x), y)
> calc.lm.t(lmf)
est se tval
(Intercept) 0.1088521158 0.09034800 1.20480938
x -0.0009323697 0.09472155 -0.00984327
I'll leave it to you to see whether this approach may or may not be
helpful from a time perspective.
HTH,
Marc Schwartz
More information about the R-help
mailing list