[R] Suspected memory leak with R v.2.5.x and large matrices with dimnames set
Peter Waltman
waltman at cs.nyu.edu
Sat Aug 18 08:49:51 CEST 2007
Hi -
Admittedly, this may not be the most sophisticated memory profiling
performed, but when using unix's top command, I'm noticing a notable
memory leak when using R with a large matrix that has dimnames set.
To allow people to reproduce the problem I'm seeing, I've added a
small (< 50 lines) code snippet at the end of this email.
I'm seeing this problem on both a MacOS box using R v.2.5.1 and a Unix
box (x86_64) running R v.2.5.0. The output from sessionInfo() for
both machines are below.
What I'm seeing is that when I create a 20k x 2k matrix that does not
have any dimnames set, if I call a function (the f() function below)
that makes a couple of local copies of subsets of the matrix and then
returns the result of some statistical massaging, R works mostly fine
(more on this below)
However, if I set the dimnames (currently commented out in the code
snippet below), and then call from the R command intrepreter:
res <- sapply( 1:10, function(i) { cat(i, "\n"); f() } )
gc()
rm( list=ls() )
gc()
unix's top command reports that R has a memory stamp of roughly 2 gig
(1.2 on the MacOS box), although R's gc() command reports for this
'empty' instance of R:
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 236823 12.7 467875 25.0 467875 25.0
Vcells 120446 1.0 109363282 834.4 155806232 1188.8
>
As I said, if the matrix does not have the dimnames set, the same
procedure will produce the same output from R's gc() command, though
unix's top command reports that R's memory stamp is actually >270
meg. Not sure if that's just a basal level of R's memory needs.
I see this on both OS's I'm using and both versions of R (v.2.5.x).
If I'm doing something wrong in my code below which is causing this
issue, please let me know, but it's fairly vanilla code so I'm not
sure
Thanks,
Peter Waltman
SessionInfo output:
Mac
> sessionInfo()
R version 2.5.1 (2007-06-27)
powerpc-apple-darwin8.9.1
locale:
C
attached base packages:
[1] "stats" "graphics" "grDevices" "utils" "datasets"
"methods"
[7] "base"
>
Unix:
> sessionInfo()
R version 2.5.0 (2007-04-23)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en
_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=
en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en
_US.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] "stats" "graphics" "grDevices" "utils" "datasets"
"methods"
[7] "base"
>
test.R:
f<-function() {
my.cols <- sample( ncol( val ), 750 )
my.r <- val[ sample( nrow( val ), 15 ),
my.cols
]
avg.rows <- apply( my.r, 2, mean, na.rm=TRUE )
rm ( my.r)
gc()
my.r.all <- val[ , my.cols ]
devs <- apply( my.r.all, 1, "-", avg.rows )
rm( my.r.all )
gc()
apply( devs, 2, var, na.rm=TRUE )
}
)
val<-matrix( rnorm( (20000*2000) ), 20000, 2000 )#, dimnames= list(
paste( "AT2G", 1:20000,sep="" ), paste( "AT2Gcol", 1:2000,sep="" )
) )
gc()
#res <- sapply(1:10, function(i) f()) # --- works fine if
dimnames aren't set
# rm( list=ls() )
#gc()
More information about the R-help
mailing list