[R] Python and R

Gabor Grothendieck ggrothendieck at gmail.com
Fri Feb 20 13:06:54 CET 2009


Note that using solve can be numerically unstable for certain problems.

On Fri, Feb 20, 2009 at 6:50 AM, Kenn Konstabel <lebatsnok at gmail.com> wrote:
> Decyphering formulas seems to be the most time consuming part of lm:
>
> mylm1 <- function(formula, data) {
>     # not perfect but works
>     F <- model.frame(formula,data)
>     y <- model.response(F)
>     mt <- attr(F, "terms")
>     x <- model.matrix(mt,F)
>     coefs <- solve(crossprod(x), crossprod(x,y))
>     coefs
>     }
>
> mylm2 <- function(x, y, intercept=TRUE) {
>     if(!is.matrix(x)) x <- as.matrix(x)
>     if(intercept) x <- cbind(1,x)
>     if(!is.matrix(y)) y <- as.matrix(y)
>     solve(crossprod(x), crossprod(x,y))
>     }
>
>> system.time(for(i in 1:1000) mylm2(EuStockMarkets[,-1],
>> EuStockMarkets[,"DAX"]))
>    user  system elapsed
>    6.43    0.00    6.53
>> system.time(for(i in 1:1000) mylm1(DAX~., EuStockMarkets))
>    user  system elapsed
>   16.19    0.00   16.23
>> system.time(for(i in 1:1000) lm(DAX~., EuStockMarkets))
>    user  system elapsed
>   21.43    0.00   21.44
>
> So if you need to save time, I'd suggest something close to mylm2 rather
> than mylm1.
>
> Kenn
>
>
>
> On Thu, Feb 19, 2009 at 8:04 PM, Gabor Grothendieck
> <ggrothendieck at gmail.com> wrote:
>>
>> On Thu, Feb 19, 2009 at 8:30 AM, Esmail Bonakdarian <esmail.js at gmail.com>
>> wrote:
>>
>> > Thanks for the suggestions, I'll have to see if I can figure out how to
>> > convert the relatively simple call to lm with an equation and the data
>> > file
>> > to the functions you mention (or if that's even feasible).
>>
>> X <- model.matrix(formula, data)
>>
>> will calculate the X matrix for you.
>>
>> >
>> > Not an expert in statistics myself, I am mostly concentrating on the
>> > programming aspects of R. Problem is that I suspect my colleagues who
>> > are providing some guidance with the stats end are not quite experts
>> > themselves, and certainly new to R.
>> >
>> > Cheers,
>> >
>> > Esmail
>> >
>> > Kenn Konstabel wrote:
>> >>
>> >> lm does lots of computations, some of which you may never need. If
>> >> speed
>> >> really matters, you might want to compute only those things you will
>> >> really
>> >> use. If you only need coefficients, then using %*%, solve and crossprod
>> >> will
>> >> be remarkably faster than lm
>> >>
>> >> # repeating someone else's example
>> >> # lm(DAX~., EuStockMarkets)
>> >>
>> >>  y <- EuStockMarkets[,"DAX"]
>> >>  x <- EuStockMarkets
>> >>  x[,1]<-1
>> >> colnames(x)[1] <- "Intercept"
>> >>
>> >> lm(y ~ x-1)
>> >> solve(crossprod(x), t(x))%*%y    # probably this can be done more
>> >> efficiently
>> >>
>> >> # and a naive timing
>> >>
>> >>  > system.time( for(i in 1:1000) lm(y ~ x-1))
>> >>   user  system elapsed
>> >>  14.64    0.33   32.69
>> >>  > system.time(for(i in 1:1000) solve(crossprod(x), crossprod(x,y)) )
>> >>   user  system elapsed
>> >>   0.36    0.00    0.36
>> >>
>> >>
>> >> Also lsfit() is a bit quicker than lm or lm.fit.
>> >>
>> >> Regards,
>> >> Kenn
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>




More information about the R-help mailing list