[R] Python and R
Gabor Grothendieck
ggrothendieck at gmail.com
Fri Feb 20 13:06:54 CET 2009
Note that using solve can be numerically unstable for certain problems.
On Fri, Feb 20, 2009 at 6:50 AM, Kenn Konstabel <lebatsnok at gmail.com> wrote:
> Decyphering formulas seems to be the most time consuming part of lm:
>
> mylm1 <- function(formula, data) {
> # not perfect but works
> F <- model.frame(formula,data)
> y <- model.response(F)
> mt <- attr(F, "terms")
> x <- model.matrix(mt,F)
> coefs <- solve(crossprod(x), crossprod(x,y))
> coefs
> }
>
> mylm2 <- function(x, y, intercept=TRUE) {
> if(!is.matrix(x)) x <- as.matrix(x)
> if(intercept) x <- cbind(1,x)
> if(!is.matrix(y)) y <- as.matrix(y)
> solve(crossprod(x), crossprod(x,y))
> }
>
>> system.time(for(i in 1:1000) mylm2(EuStockMarkets[,-1],
>> EuStockMarkets[,"DAX"]))
> user system elapsed
> 6.43 0.00 6.53
>> system.time(for(i in 1:1000) mylm1(DAX~., EuStockMarkets))
> user system elapsed
> 16.19 0.00 16.23
>> system.time(for(i in 1:1000) lm(DAX~., EuStockMarkets))
> user system elapsed
> 21.43 0.00 21.44
>
> So if you need to save time, I'd suggest something close to mylm2 rather
> than mylm1.
>
> Kenn
>
>
>
> On Thu, Feb 19, 2009 at 8:04 PM, Gabor Grothendieck
> <ggrothendieck at gmail.com> wrote:
>>
>> On Thu, Feb 19, 2009 at 8:30 AM, Esmail Bonakdarian <esmail.js at gmail.com>
>> wrote:
>>
>> > Thanks for the suggestions, I'll have to see if I can figure out how to
>> > convert the relatively simple call to lm with an equation and the data
>> > file
>> > to the functions you mention (or if that's even feasible).
>>
>> X <- model.matrix(formula, data)
>>
>> will calculate the X matrix for you.
>>
>> >
>> > Not an expert in statistics myself, I am mostly concentrating on the
>> > programming aspects of R. Problem is that I suspect my colleagues who
>> > are providing some guidance with the stats end are not quite experts
>> > themselves, and certainly new to R.
>> >
>> > Cheers,
>> >
>> > Esmail
>> >
>> > Kenn Konstabel wrote:
>> >>
>> >> lm does lots of computations, some of which you may never need. If
>> >> speed
>> >> really matters, you might want to compute only those things you will
>> >> really
>> >> use. If you only need coefficients, then using %*%, solve and crossprod
>> >> will
>> >> be remarkably faster than lm
>> >>
>> >> # repeating someone else's example
>> >> # lm(DAX~., EuStockMarkets)
>> >>
>> >> y <- EuStockMarkets[,"DAX"]
>> >> x <- EuStockMarkets
>> >> x[,1]<-1
>> >> colnames(x)[1] <- "Intercept"
>> >>
>> >> lm(y ~ x-1)
>> >> solve(crossprod(x), t(x))%*%y # probably this can be done more
>> >> efficiently
>> >>
>> >> # and a naive timing
>> >>
>> >> > system.time( for(i in 1:1000) lm(y ~ x-1))
>> >> user system elapsed
>> >> 14.64 0.33 32.69
>> >> > system.time(for(i in 1:1000) solve(crossprod(x), crossprod(x,y)) )
>> >> user system elapsed
>> >> 0.36 0.00 0.36
>> >>
>> >>
>> >> Also lsfit() is a bit quicker than lm or lm.fit.
>> >>
>> >> Regards,
>> >> Kenn
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
More information about the R-help
mailing list