Bert Gunter
gunter.berton at gene.com
Wed Apr 30 00:43:41 CEST 2008
If you can(one dimensional only), try using lowess() instead. Probably in a
for loop as Ray suggested.
loess() is more powerful and flexible, but you pay for it in extra
complexity and time. Maybe in this case, it's not worth it.
In addition to Tony's suggestion, have a look at the following sequence,
which
I suspect is because the call to apply will duplicate your 1.5GB matrix,
whereas the for loop doesn't [I stand to be corrected here].
> x <- matrix(runif(210000), 21)
> unix.time({res <- numeric(ncol(x)); for(i in 1:length(res)) res[i] <-
sum(x[, i])})
user system elapsed
0.079 0.000 0.079
> unix.time(apply(x, 2, sum))
user system elapsed
0.10 0.01 0.11
> x <- matrix(runif(2100000), 21)
> unix.time({res <- numeric(ncol(x)); for(i in 1:length(res)) res[i] <-
sum(x[, i])})
user system elapsed
0.791 0.010 0.801
> unix.time(apply(x, 2, sum))
user system elapsed
1.096 0.011 1.107
> x <- matrix(runif(21000000), 21)
> unix.time({res <- numeric(ncol(x)); for(i in 1:length(res)) res[i] <-
sum(x[, i])})
user system elapsed
7.825 0.011 7.840
> unix.time(apply(x, 2, sum))
user system elapsed
15.431 0.142 15.592
>
Also, preliminary checking using the top utility shows the for loop requires
just over half the memory of the apply() call. This is on a NetBSD system
with 2GB memory.
> It's quite possible that much of the time spent in loess() is setting up
> the data (i.e., the formula, terms, model.frame, etc.), and that much of
> that is repeated identically for each call to loess(). I would suggest
> looking at the code of loess() and work out what arguments it is calling
> simpleLoess() with, and then try calling stats:::simpleLoess() directly.
> (Of course you have to be careful with this because this is not using the
> published API).
>
> -- Tony Plate
>
> Sudipta Sarkar wrote:
> > Respected R experts,
> > I am trying to apply a user function that basically calls and
> > applies the R loess function from stat package over each time
> > series. I have a large matrix of size 21 X 9000000 and I need
> > to apply the loess for each column and hence I have
> > implemented this separate user function that applies loess
> > over each column and I am calling this function foo as follows:
> > xc<-apply(t,2,foo) where t is my 21 X 9000000 matrix and
> > loess. This is turning out to be a very slow process and I
> > need to repeat this step for 25-30 such large matrix chunks.
> > Is there any trick I can use to make this work faster?
> > Any help will be deeply appreciated.
