[R] Suspected memory leak with R v.2.5.x and large matrices with dimnames set

Peter Waltman waltman at cs.nyu.edu
Sat Aug 18 08:49:51 CEST 2007


   Hi -
   Admittedly,  this  may  not be the most sophisticated memory profiling
   performed,  but  when using unix's top command, I'm noticing a notable
   memory leak when using R with a large matrix that has dimnames set.
   To  allow  people  to  reproduce  the problem I'm seeing, I've added a
   small (< 50 lines) code snippet at the end of this email.
   I'm seeing this problem on both a MacOS box using R v.2.5.1 and a Unix
   box  (x86_64)  running  R  v.2.5.0.  The output from sessionInfo() for
   both machines are below.
   What  I'm seeing is that when I create a 20k x 2k matrix that does not
   have  any  dimnames set, if I call a function (the f() function below)
   that  makes a couple of local copies of subsets of the matrix and then
   returns  the result of some statistical massaging, R works mostly fine
   (more on this below)
   However,  if  I  set the dimnames (currently commented out in the code
   snippet below), and then call from the R command intrepreter:

     res <- sapply( 1:10, function(i) { cat(i, "\n"); f() } )
     gc()
     rm( list=ls() )
     gc()

   unix's  top command reports that R has a memory stamp of roughly 2 gig
   (1.2  on  the  MacOS  box), although R's gc() command reports for this
   'empty' instance of R:

     > gc()
              used (Mb) gc trigger  (Mb)  max used   (Mb)
     Ncells 236823 12.7     467875  25.0    467875   25.0
     Vcells 120446  1.0  109363282 834.4 155806232 1188.8
     >

   As  I  said,  if  the  matrix does not have the dimnames set, the same
   procedure  will  produce the same output from R's gc() command, though
   unix's  top  command  reports  that  R's memory stamp is actually >270
   meg.  Not sure if that's just a basal level of R's memory needs.
   I  see  this  on both OS's I'm using and both versions of R (v.2.5.x).
   If  I'm  doing  something wrong in my code below which is causing this
   issue,  please  let  me  know, but it's fairly vanilla code so I'm not
   sure
   Thanks,
   Peter Waltman
   SessionInfo output:

     Mac

     > sessionInfo()
     R version 2.5.1 (2007-06-27)
     powerpc-apple-darwin8.9.1
     locale:
     C
     attached base packages:
     [1]  "stats"      "graphics"   "grDevices"  "utils"      "datasets"
     "methods"
     [7] "base"
     >

     Unix:

     > sessionInfo()
     R version 2.5.0 (2007-04-23)
     x86_64-unknown-linux-gnu
     locale:
     LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en
     _US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=
     en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en
     _US.UTF-8;LC_IDENTIFICATION=C
     attached base packages:
     [1]  "stats"      "graphics"   "grDevices"  "utils"      "datasets"
     "methods"
     [7] "base"
     >

   test.R:

       f<-function() {
         my.cols <- sample( ncol( val ), 750 )
         my.r <- val[ sample( nrow( val ), 15 ),
                      my.cols
                     ]
         avg.rows <- apply( my.r, 2, mean, na.rm=TRUE )
         rm ( my.r)
         gc()
         my.r.all <- val[ , my.cols ]

         devs <- apply( my.r.all, 1, "-", avg.rows )
         rm( my.r.all )
         gc()
         apply( devs, 2, var, na.rm=TRUE )
       }
       )
     val<-matrix( rnorm( (20000*2000) ), 20000, 2000 )#, dimnames= list(
     paste(  "AT2G", 1:20000,sep="" ), paste( "AT2Gcol", 1:2000,sep="" )
     ) )
     gc()
     #res  <-  sapply(1:10,  function(i)  f())    #    --- works fine if
     dimnames aren't set
     # rm( list=ls() )
     #gc()


More information about the R-help mailing list