[R] Applying user function over a large matrix

Bert Gunter gunter.berton at gene.com
Wed Apr 30 00:43:41 CEST 2008


If you can(one dimensional only), try using lowess() instead. Probably in a
for loop as Ray suggested.

loess() is more powerful and flexible, but you pay for it in extra
complexity and time. Maybe in this case, it's not worth it.

-- Bert Gunter
Genentech

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Ray Brownrigg
Sent: Tuesday, April 29, 2008 3:19 PM
To: r-help at r-project.org
Cc: Tony Plate
Subject: Re: [R] Applying user function over a large matrix

In addition to Tony's suggestion, have a look at the following sequence,
which 
I suspect is because the call to apply will duplicate your 1.5GB matrix, 
whereas the for loop doesn't [I stand to be corrected here].

> x <- matrix(runif(210000), 21)
> unix.time({res <- numeric(ncol(x)); for(i in 1:length(res)) res[i] <- 
sum(x[, i])})
   user  system elapsed
  0.079   0.000   0.079
> unix.time(apply(x, 2, sum))
   user  system elapsed
   0.10    0.01    0.11
> x <- matrix(runif(2100000), 21)
> unix.time({res <- numeric(ncol(x)); for(i in 1:length(res)) res[i] <- 
sum(x[, i])})
   user  system elapsed
  0.791   0.010   0.801
> unix.time(apply(x, 2, sum))
   user  system elapsed
  1.096   0.011   1.107
> x <- matrix(runif(21000000), 21)
> unix.time({res <- numeric(ncol(x)); for(i in 1:length(res)) res[i] <- 
sum(x[, i])})
   user  system elapsed
  7.825   0.011   7.840
> unix.time(apply(x, 2, sum))
   user  system elapsed
 15.431   0.142  15.592
> 

Also, preliminary checking using the top utility shows the for loop requires

just over half the memory of the apply() call.  This is on a NetBSD system 
with 2GB memory.

HTH,
Ray Brownrigg

On Wed, 30 Apr 2008, Tony Plate wrote:
> It's quite possible that much of the time spent in loess() is setting up
> the data (i.e., the formula, terms, model.frame, etc.), and that much of
> that is repeated identically for each call to loess().  I would suggest
> looking at the code of loess() and work out what arguments it is calling
> simpleLoess() with, and then try calling stats:::simpleLoess() directly. 
> (Of course you have to be careful with this because this is not using the
> published API).
>
> -- Tony Plate
>
> Sudipta Sarkar wrote:
> > Respected R experts,
> > I am trying to apply a user function that basically calls and
> > applies the R loess function from stat package over each time
> > series. I have a large matrix of size 21 X 9000000 and I need
> > to apply the loess for each column and hence I have
> > implemented this separate user function that applies loess
> > over each column and I am calling this function foo as follows:
> > xc<-apply(t,2,foo) where t is my 21 X 9000000 matrix and
> > loess. This is turning out to be a very slow process and I
> > need to repeat this step for 25-30 such large matrix chunks.
> > Is there any trick I can use to make this work faster?
> > Any help will be deeply appreciated.
> > Regards
> >
> >
> > Sudipta Sarkar PhD
> > Senior Analyst/Scientist
> > Lanworth Inc. (Formerly Forest One Inc.)
> > 300 Park Blvd., Ste 425
> > Itasca, IL
> > Ph: 630-250-0468
> >

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list