[R] rbind wastes memory
Duncan Murdoch
murdoch at stats.uwo.ca
Mon May 30 16:08:01 CEST 2005
lutz.thieme at amd.com wrote:
> Hello everybody,
>
> if I try to (r)bind a number of large dataframes I run out of memory because R
> wastes memory and seems to "forget" to release memory.
>
> For example I have 10 files. Each file contains a large dataframe "ds" (3500 cols
> by 800 rows) which needs ~20 MB RAM if it is loaded as the only object.
> Now I try to bind all data frames to a large one and need more than 1165MB (!)
> RAM (To simplify the R code, I use the same file ten times):
>
> ________ start example 1 __________
> load(myFile)
> ds.tmp <- ds
> for (Cycle in 1:10) {
> ds.tmp <- rbind(ds.tmp, ds)
> }
> ________ end example 1 __________
>
>
>
> Stepping into details I found the following (comment shows RAM usage after this line
> was executed):
> load(myFile) # 40MB (19MB for R itself)
> ds.tmp <- ds # 40MB; => only a pointer seems to be copied
> x<-rbind(ds.tmp, ds) # 198MB
> x<-rbind(ds.tmp, ds) # 233MB; the same instruction a second time leads to
> # 35MB more RAM usage - why?
I'm guessing your problem is fragmented memory. You are creating big
objects, then making them bigger. This means R needs to go looking for
large allocations for the replacements, but they won't fit in the spots
left by the things you've deleted, so those are being left empty.
A solution to this is to use two passes: first figure out how much
space you need, then allocate it and fill it. E.g.
for (Cycle in 1:10) {
rows[Cycle] <- .... some calculation based on the data ...
}
ds.tmp <- data.frame(x=double(sum(rows)), y=double(sum(rows)), ...
for (Cycle in 1:10) {
ds.tmp[ appropriate rows, ] <- new data
}
Duncan Murdoch
More information about the R-help
mailing list