[R] R + memory of objects
Marc Jekel
mjekel at uni-bonn.de
Fri Dec 2 16:17:49 CET 2011
Dear R community,
I am still struggling a bit on how R does memory allocation and how to optimize my code to minimize working memory load. Simon (thanks!) and others gave me a hint to use the command "gc()" to clean up memory which works quite nice but appears to me to be more like a "fix" to a problem.
To give you an impression of what I am talking, here is a short code example + I will give rough measure (system track app) of my working memory needed for each computational step (R64bit latest version on WIN 7 64 bit system, 2 Cores, approx 4 GB Ram):
##########################
# example 1:
y= matrix(rep(1,50000000), nrow = 50000000/2 , ncol = 2)
# used working memory increases from 1044 --> 1808 MB
# (same command again, i.e.)
y= matrix(rep(1,50000000), nrow = 50000000/2 , ncol = 2)
# 1808 MB --> 2178 MB Why does memory increase?
# (give the matrix column names)
colnames(y) = c("col1", "col2")
# 2178 MB --> 1781 MB Why does the size of an object decrease if I assign column labels?
###
# example 2:
y= matrix(rep(1,50000000), nrow = 50000000/2 , ncol = 2)
1016 --> 1780 MB
y = data.frame(y)
# increase from 1780 MB --> 3315 MB
##########################
Why does it take so much extra memory to store this matrix as a data.frame?
It is not the object per se (i.e. that data.frames need more memory) because if I use gc() memory size drops to 1387 MB. Does this mean that it may be more memory-efficient not to use any data.frames but matrices only? etc.
This puzzles me a lot. From my experience these effects are also accentuated for larger objects.
As an anecdotal comparison: I also used Stata in my last project due to these memory problems and I could do a lot of variable manipulations of the same (!) data with significant (I am talking about GB) less memory needed.
Best,
Marc
More information about the R-help
mailing list