Decyphering formulas seems to be the most time consuming part of lm:
mylm1 <- function(formula, data) {
# not perfect but works
F <- model.frame(formula,data)
y <- model.response(F)
mt <- attr(F, "terms")
x <- model.matrix(mt,F)
coefs <- solve(crossprod(x), crossprod(x,y))
coefs
}
mylm2 <- function(x, y, intercept=TRUE) {
if(!is.matrix(x)) x <- as.matrix(x)
if(intercept) x <- cbind(1,x)
if(!is.matrix(y)) y <- as.matrix(y)
solve(crossprod(x), crossprod(x,y))
}
> system.time(for(i in 1:1000) mylm2(EuStockMarkets[,-1],
EuStockMarkets[,"DAX"]))
user system elapsed
6.43 0.00 6.53
> system.time(for(i in 1:1000) mylm1(DAX~., EuStockMarkets))
user system elapsed
16.19 0.00 16.23
> system.time(for(i in 1:1000) lm(DAX~., EuStockMarkets))
user system elapsed
21.43 0.00 21.44
So if you need to save time, I'd suggest something close to mylm2 rather
than mylm1.
Kenn
On Thu, Feb 19, 2009 at 8:04 PM, Gabor Grothendieck wrote:
> On Thu, Feb 19, 2009 at 8:30 AM, Esmail Bonakdarian
> wrote:
>
> Thanks for the suggestions, I'll have to see if I can figure out how to
> > convert the relatively simple call to lm with an equation and the data
> file
> > to the functions you mention (or if that's even feasible).
>
> X <- model.matrix(formula, data)
>
> will calculate the X matrix for you.
>
> >
> > Not an expert in statistics myself, I am mostly concentrating on the
> > programming aspects of R. Problem is that I suspect my colleagues who
> > are providing some guidance with the stats end are not quite experts
> > themselves, and certainly new to R.
> >
> > Cheers,
> >
> > Esmail
> >
> > Kenn Konstabel wrote:
> >>
> >> lm does lots of computations, some of which you may never need. If speed
> >> really matters, you might want to compute only those things you will
> really
> >> use. If you only need coefficients, then using %*%, solve and crossprod
> will
> >> be remarkably faster than lm
> >>
> >> # repeating someone else's example
> >> # lm(DAX~., EuStockMarkets)
> >>
> >> y <- EuStockMarkets[,"DAX"]
> >> x <- EuStockMarkets
> >> x[,1]<-1
> >> colnames(x)[1] <- "Intercept"
> >>
> >> lm(y ~ x-1)
> >> solve(crossprod(x), t(x))%*%y # probably this can be done more
> >> efficiently
> >>
> >> # and a naive timing
> >>
> >> > system.time( for(i in 1:1000) lm(y ~ x-1))
> >> user system elapsed
> >> 14.64 0.33 32.69
> >> > system.time(for(i in 1:1000) solve(crossprod(x), crossprod(x,y)) )
> >> user system elapsed
> >> 0.36 0.00 0.36
> >>
> >>
> >> Also lsfit() is a bit quicker than lm or lm.fit.
> >>
> >> Regards,
> >> Kenn
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]