[R] memory usage grows too fast

Fri May 15 19:15:42 CEST 2009

Hi William,

Thanks for the comments and explanation.
It is really good to know the details of rowMeans.
I did modified Peter's codes from length(x[x=="02"]) to sum(x=="02"), though it improved only in few seconds. :)

Best,
Mike

-----Original Message-----
From: William Dunlap [mailto:wdunlap at tibco.com] 
Sent: Friday, May 15, 2009 10:09 AM
To: Ping-Hsun Hsieh
Subject: RE: [R] memory usage grows too fast

rowMeans(dataMatrix=="02") must
  (a) make a logical matrix the dimensions of dataMatrix in which to put
       the result of dataMatrix=="02" (4 bytes/logical element)
  (b) make a double precision matrix (8 bytes/element) the size of that
       logical matrix because rowMeans uses some C code that only works
on
       doubles
apply(dataMatrix,1,function(x)length(x[x=="02"])/ncol(dataMatrix))
never has to make any copies of the entire matrix.  It extracts a row
at a time and when it is done with the row, the memory used for
working on the row is available for other uses.  Note that it would
probably
be a tad faster if it were changed to
   apply(dataMatrix,1,function(x)sum(x=="02")) / ncol(dataMatrix)
as sum(logicalVector) is the same as length(x[logicalVector]) and there
is no need to compute ncol(dataMatrix) more than once.

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com  

> -----Original Message-----
> From: Ping-Hsun Hsieh [mailto:hsiehp at ohsu.edu] 
> Sent: Friday, May 15, 2009 9:58 AM
> To: Peter Alspach; William Dunlap; hadley wickham
> Cc: r-help at r-project.org
> Subject: RE: [R] memory usage grows too fast
> 
> Thanks for Peter, William, and Hadley's helps.
> Your codes are much more concise than mine.  :P
>  
> Both William and Hadley's comments are the same. Here are their codes.
> 
> 	f <- function(dataMatrix) rowMeans(datamatrix=="02")
> 
> And Peter's codes are the following.
> 
> 	apply(yourMatrix, 1, function(x) 
> length(x[x==yourPattern]))/ncol(yourMatrix)
> 
> 
> In terms of the running time, the first one ran faster than 
> the later one on my dataset (2.5 mins vs. 6.4 mins)
> The memory consumption, however, of the first one is much 
> higher than the later.  ( >8G vs. ~3G )
> 
> Any thoughts? My guess is the rowMeans created extra copies 
> to perform its calculation, but not so sure.
> And I am also interested in understanding ways to handle 
> memory issues. Help someone could shed light on this for me. :)
> 
> Best,
> Mike
> 
> -----Original Message-----
> From: Peter Alspach [mailto:PAlspach at hortresearch.co.nz] 
> Sent: Thursday, May 14, 2009 4:47 PM
> To: Ping-Hsun Hsieh
> Subject: RE: [R] memory usage grows too fast
> 
> Tena koe Mike
> 
> If I understand you correctly, you should be able to use 
> something like:
> 
> apply(yourMatrix, 1, function(x)
> length(x[x==yourPattern]))/ncol(yourMatrix)
> 
> I see you've divided by nrow(yourMatrix) so perhaps I am missing
> something.
> 
> HTH ...
> 
> Peter Alspach
> 
>  
> 
> > -----Original Message-----
> > From: r-help-bounces at r-project.org 
> > [mailto:r-help-bounces at r-project.org] On Behalf Of Ping-Hsun Hsieh
> > Sent: Friday, 15 May 2009 11:22 a.m.
> > To: r-help at r-project.org
> > Subject: [R] memory usage grows too fast
> > 
> > Hi All,
> > 
> > I have a 1000x1000000 matrix. 
> > The calculation I would like to do is actually very simple: 
> > for each row, calculate the frequency of a given pattern. For 
> > example, a toy dataset is as follows.
> > 
> > Col1	Col2	Col3	Col4
> > 01	02	02	00		=> Freq of "02" is 0.5
> > 02	02	02	01		=> Freq of "02" is 0.75
> > 00	02	01	01		...
> > 
> > My code is quite simple as the following to find the pattern "02".
> > 
> > OccurrenceRate_Fun<-function(dataMatrix)
> > {
> >   tmp<-NULL
> >   tmpMatrix<-apply(dataMatrix,1,match,"02")
> >    for ( i in 1: ncol(tmpMatrix))
> >   {
> >     tmpRate<-table(tmpMatrix[,i])[[1]]/ nrow(tmpMatrix)
> >     tmp<-c(tmp,tmpHET)
> >   }
> >   rm(tmpMatrix)
> >   rm(tmpRate)
> >   return(tmp)
> >   gc()
> > }
> > 
> > The problem is the memory usage grows very fast and hard to 
> > be handled on machines with less RAM.
> > Could anyone please give me some comments on how to reduce 
> > the space complexity in this calculation?
> > 
> > Thanks,
> > Mike
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> > 
> 
> The contents of this e-mail are confidential and may be 
> subject to legal privilege.
>  If you are not the intended recipient you must not use, 
> disseminate, distribute or
>  reproduce all or any part of this e-mail or attachments.  If 
> you have received this
>  e-mail in error, please notify the sender and delete all 
> material pertaining to this
>  e-mail.  Any opinion or views expressed in this e-mail are 
> those of the individual
>  sender and may not represent those of The New Zealand 
> Institute for Plant and
>  Food Research Limited.
>