[R] memory usage grows too fast

Fri May 15 18:57:45 CEST 2009

Thanks for Peter, William, and Hadley's helps.
Your codes are much more concise than mine.  :P

Both William and Hadley's comments are the same. Here are their codes.

	f <- function(dataMatrix) rowMeans(datamatrix=="02")

And Peter's codes are the following.

	apply(yourMatrix, 1, function(x) length(x[x==yourPattern]))/ncol(yourMatrix)

In terms of the running time, the first one ran faster than the later one on my dataset (2.5 mins vs. 6.4 mins)
The memory consumption, however, of the first one is much higher than the later.  ( >8G vs. ~3G )

Any thoughts? My guess is the rowMeans created extra copies to perform its calculation, but not so sure.
And I am also interested in understanding ways to handle memory issues. Help someone could shed light on this for me. :)

Best,
Mike

-----Original Message-----
From: Peter Alspach [mailto:PAlspach at hortresearch.co.nz] 
Sent: Thursday, May 14, 2009 4:47 PM
To: Ping-Hsun Hsieh
Subject: RE: [R] memory usage grows too fast

Tena koe Mike

If I understand you correctly, you should be able to use something like:

apply(yourMatrix, 1, function(x)
length(x[x==yourPattern]))/ncol(yourMatrix)

I see you've divided by nrow(yourMatrix) so perhaps I am missing
something.

HTH ...

Peter Alspach

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Ping-Hsun Hsieh
> Sent: Friday, 15 May 2009 11:22 a.m.
> To: r-help at r-project.org
> Subject: [R] memory usage grows too fast
> 
> Hi All,
> 
> I have a 1000x1000000 matrix. 
> The calculation I would like to do is actually very simple: 
> for each row, calculate the frequency of a given pattern. For 
> example, a toy dataset is as follows.
> 
> Col1	Col2	Col3	Col4
> 01	02	02	00		=> Freq of "02" is 0.5
> 02	02	02	01		=> Freq of "02" is 0.75
> 00	02	01	01		...
> 
> My code is quite simple as the following to find the pattern "02".
> 
> OccurrenceRate_Fun<-function(dataMatrix)
> {
>   tmp<-NULL
>   tmpMatrix<-apply(dataMatrix,1,match,"02")
>    for ( i in 1: ncol(tmpMatrix))
>   {
>     tmpRate<-table(tmpMatrix[,i])[[1]]/ nrow(tmpMatrix)
>     tmp<-c(tmp,tmpHET)
>   }
>   rm(tmpMatrix)
>   rm(tmpRate)
>   return(tmp)
>   gc()
> }
> 
> The problem is the memory usage grows very fast and hard to 
> be handled on machines with less RAM.
> Could anyone please give me some comments on how to reduce 
> the space complexity in this calculation?
> 
> Thanks,
> Mike
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

The contents of this e-mail are confidential and may be ...{{dropped:14}}